winter 2012-2013 compiler principles loop optimizations and register allocation

136
Winter 2012-2013 Compiler Principles Loop Optimizations and Register Allocation Mayer Goldberg and Roman Manevich Ben-Gurion University

Upload: vernon

Post on 23-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Winter 2012-2013 Compiler Principles Loop Optimizations and Register Allocation. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Review (global) dataflow analysis Join semilattices Monotone dataflow frameworks Termination - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Winter 2012-2013Compiler PrinciplesLoop Optimizations

and Register Allocation

Mayer Goldberg and Roman ManevichBen-Gurion University

Page 2: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

2

Today• Review (global) dataflow analysis

– Join semilattices• Monotone dataflow frameworks

– Termination– Distribute transfer functions join over all paths

• Loop optimizations– Introduce reaching definitions analysis– Loop code motion– (Strength reduction via induction variables)

• Register allocation by graph coloring– From liveness to register interference graph– Heuristics for graph coloring

Page 3: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

3

Liveness Analysis

• A variable is live at a point in a program if later in the program its value will be read before it is written to again

Page 4: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

4

Join semilattice definition• A join semilattice is a pair (V, ), where• V is a domain of elements• is a join operator that is– commutative: x y = y x– associative: (x y) z = x (y z)– idempotent: x x = x

• If x y = z, we say that z is the joinor (Least Upper Bound) of x and y

• Every join semilattice has a bottom element denoted such that x = x for all x

Page 5: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

5

Partial ordering induced by join

• Every join semilattice (V, ) induces an ordering relationship over its elements

• Define x y iff x y = y• Need to prove– Reflexivity: x x– Antisymmetry: If x y and y x, then x = y– Transitivity: If x y and y z, then x z

Page 6: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

6

A join semilattice for liveness• Sets of live variables and the set union operation• Idempotent:– x x = x

• Commutative:– x y = y x

• Associative:– (x y) z = x (y z)

• Bottom element:– The empty set: Ø x = x

• Ordering over elements = subset relation

Page 7: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

7

Join semilattice example for liveness

{}

{a} {b} {c}

{a, b} {a, c} {b, c}

{a, b, c}

Bottom element

Page 8: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

8

Dataflow framework

• A global analysis is a tuple (D, V, , F, I), where– D is a direction (forward or backward)• The order to visit statements within a basic block,

NOT the order in which to visit the basic blocks– V is a set of values (sometimes called domain)– is a join operator over those values– F is a set of transfer functions fs : V V

(for every statement s)– I is an initial value

Page 9: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

9

Running global analyses• Assume that (D, V, , F, I) is a forward analysis• For every statement s maintain values before - IN[s] - and

after - OUT[s]• Set OUT[s] = for all statements s• Set OUT[entry] = I• Repeat until no values change:– For each statement s with predecessors

PRED[s]={p1, p2, … , pn}• Set IN[s] = OUT[p1] OUT[p2] … OUT[pn]• Set OUT[s] = fs(IN[s])

• The order of this iteration does not matter– Chaotic iteration

Page 10: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

10

Proving termination

• Our algorithm for running these analyses continuously loops until no changes are detected

• Problem: how do we know the analyses will eventually terminate?

Page 11: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

11

A non-terminating analysis

• The following analysis will loop infinitely on any CFG containing a loop:

• Direction: Forward• Domain: ℕ• Join operator: max• Transfer function: f(n) = n + 1• Initial value: 0

Page 12: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

12

A non-terminating analysis

start

end

x = y

Page 13: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

13

Initialization

start

end

x = y0

0

Page 14: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

14

Fixed-point iteration

start

end

x = y0

0

Page 15: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

15

Choose a block

start

end

x = y0

0

Page 16: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

16

Iteration 1

start

end

x = y0

0

0

Page 17: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

17

Iteration 1

start

end

x = y1

0

0

Page 18: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

18

Choose a block

start

end

x = y1

0

0

Page 19: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

19

Iteration 2

start

end

x = y1

0

0

Page 20: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

20

Iteration 2

start

end

x = y1

0

1

Page 21: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

21

Iteration 2

start

end

x = y2

0

1

Page 22: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

22

Choose a block

start

end

x = y2

0

1

Page 23: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

23

Iteration 3

start

end

x = y2

0

1

Page 24: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

24

Iteration 3

start

end

x = y2

0

2

Page 25: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

25

Iteration 3

start

end

x = y3

0

2

Page 26: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

26

Why doesn’t this terminate?• Values can increase without bound• Note that “increase” refers to the lattice ordering,

not the ordering on the natural numbers• The height of a semilattice is the length of the

longest increasing sequence in that semilattice• The dataflow framework is not guaranteed to

terminate for semilattices of infinite height• Note that a semilattice can be infinitely large but

have finite height– e.g. constant propagation 0

1

2

3

4

...

Page 27: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

27

Height of a lattice• An increasing chain is a sequence of elements a1 a2 … ak

– The length of such a chain is k• The height of a lattice is the length of the maximal

increasing chain• For liveness with n program variables:– {} {v1} {v1,v2} … {v1,…,vn}

• For available expressions it is the number of expressions of the form a=b op c– For n program variables and m operator types:

mn3

Page 28: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

28

Another non-terminating analysis

• This analysis works on a finite-height semilattice, but will not terminate on certain CFGs:

• Direction: Forward• Domain: Boolean values true and false• Join operator: Logical OR• Transfer function: Logical NOT• Initial value: false

Page 29: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

29

A non-terminating analysis

start

end

x = y

Page 30: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

30

Initialization

start

end

x = yfalse

false

Page 31: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

31

Fixed-point iteration

start

end

x = yfalse

false

Page 32: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

32

Choose a block

start

end

x = yfalse

false

Page 33: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

33

Iteration 1

start

end

x = yfalse

false

false

Page 34: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

34

Iteration 1

start

end

x = ytrue

false

false

Page 35: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

35

Iteration 2

start

end

x = ytrue

false

true

Page 36: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

36

Iteration 2

start

end

x = yfalse

false

true

Page 37: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

37

Iteration 3

start

end

x = yfalse

false

false

Page 38: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

38

Iteration 3

start

end

x = ytrue

false

false

Page 39: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

39

Why doesn’t it terminate?

• Values can loop indefinitely• Intuitively, the join operator keeps pulling

values up• If the transfer function can keep pushing

values back down again, then the values might cycle forever

false

true

false

true

false

...

Page 40: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

40

Why doesn’t it terminate?

• Values can loop indefinitely• Intuitively, the join operator keeps pulling

values up• If the transfer function can keep pushing

values back down again, then the values might cycle forever

• How can we fix this?

false

true

false

true

false

...

Page 41: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

41

Monotone transfer functions• A transfer function f is monotone iff

if x y, then f(x) f(y)• Intuitively, if you know less information about a

program point, you can't “gain back” more information about that program point

• Many transfer functions are monotone, including those for liveness and constant propagation

• Note: Monotonicity does not mean that x f(x)– (This is a different property called extensivity)

Page 42: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

42

Liveness and monotonicity

• A transfer function f is monotone iff if x y, then f(x) f(y)

• Recall our transfer function for a = b + c is– fa = b + c(V) = (V – {a}) {b, c}

• Recall that our join operator is set union and induces an ordering relationship X Y iff X Y

• Is this monotone?

Page 43: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

43

Is constant propagation monotone?

• A transfer function f is monotone iff if x y, then f(x) f(y)

• Recall our transfer functions– fx=k(V) = V|xk (update V by mapping x to k)– fx=a+b(V) = V|xNot-a-Constant (assign Not-a-Constant)

• Is this monotone?

Undefined

0-1-2 1 2 ......

Not-a-constant

Page 44: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

44

The grand result

• Theorem: A dataflow analysis with a finite-height semilattice and family of monotone transfer functions always terminates

• Proof sketch:– The join operator can only bring values up– Transfer functions can never lower values back

down below where they were in the past (monotonicity)

– Values cannot increase indefinitely (finite height)

Page 45: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

45

An “optimality” result• A transfer function f is distributive if

f(a b) = f(a) f(b)for every domain elements a and b

• If all transfer functions are distributive then the fixed-point solution is the solution that would be computed by joining results from all (potentially infinite) control-flow paths– Join over all paths

• Optimal if we ignore program conditions

Page 46: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

46

An “optimality” result• A transfer function f is distributive if

f(a b) = f(a) f(b)for every domain elements a and b

• If all transfer functions are distributive then the fixed-point solution is equal to the solution computed by joining results from all (potentially infinite) control-flow paths– Join over all paths

• Optimal if we pretend all control-flow paths can be executed by the program

• Which analyses use distributive functions?

Page 47: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

47

Loop optimizations• Most of a program’s computations are done inside loops– Focus optimizations effort on loops

• The optimizations we’ve seen so far are independent of the control structure

• Some optimizations are specialized to loops– Loop-invariant code motion– (Strength reduction via induction variables)

• Require another type of analysis to find out where expressions get their values from– Reaching definitions

• (Also useful for improving register allocation)

Page 48: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

48

Loop invariant computation

y = t * 4x < y + z

endx = x + 1

start

y = …t = …z = …

Page 49: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

49

Loop invariant computation

y = t * 4x < y + z

endx = x + 1

start

y = …t = …z = …

t*4 and y+zhave same value on each iteration

Page 50: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

50

Code hoisting

x < w

endx = x + 1

start

y = …t = …z = …y = t * 4w = y + z

Page 51: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

51

What reasoning did we use?

y = t * 4x < y + z

endx = x + 1

start

y = …t = …z = …

y is defined inside loop but it is loop invariant since t*4 is loop-invariant

Both t and z are defined only outside of loop

constants are trivially loop-invariant

Page 52: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

52

What about now?

y = t * 4x < y + z

endx = x + 1t = t + 1

start

y = …t = …z = …

Now t is not loop-invariant and so are t*4 and y

Page 53: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

53

Loop-invariant code motion• d: t = a1 op a2

– d is a program location

• a1 op a2 loop-invariant (for a loop L) if computes the same value in each iteration– Hard to know in general

• Conservative approximation– Each ai is a constant, or– All definitions of ai that reach d are outside L, or

– Only one definition of of ai reaches d, and is loop-invariant itself

• Transformation: hoist the loop-invariant code outside of the loop

Page 54: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

54

Reaching definitions analysis• A definition d: t = … reaches a program location if there is a path

from the definition to the program location, along which the defined variable is never redefined

Page 55: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

55

Reaching definitions analysis• A definition d: t = … reaches a program location if there is a

path from the definition to the program location, along which the defined variable is never redefined

• Direction: Forward• Domain: sets of program locations that are definitions `• Join operator: union• Transfer function:

fd: a=b op c(RD) = (RD - defs(a)) {d} fd: not-a-def(RD) = RD– Where defs(a) is the set of locations defining a (statements of the

form a=...)• Initial value: {}

Page 56: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

56

Reaching definitions analysis

d4: y = t * 4

d4:x < y + z

d6: x = x + 1

d1: y = …

d2: t = …

d3: z = …

start

end{}

Page 57: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

57

Reaching definitions analysis

d4: y = t * 4

d4:x < y + z

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

end{}

Page 58: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

58

Initialization

d4: y = t * 4

d4:x < y + z

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{}

{}

end{}

Page 59: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

59

Iteration 1

d4: y = t * 4

d4:x < y + z

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{}

{}

end{}

{}

Page 60: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

60

Iteration 1

d4: y = t * 4

d4:x < y + z

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1}

{d1, d2}

{d1, d2, d3}

end{}

{}

{}

Page 61: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

61

Iteration 2

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{}

{}

Page 62: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

62

Iteration 2

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{}

{}

Page 63: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

63

Iteration 2

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{}

{}

Page 64: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

64

Iteration 2

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{d2, d3, d4}

{}

Page 65: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

65

Iteration 3

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{d2, d3, d4}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{d2, d3, d4}

{}

Page 66: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

66

Iteration 3

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{d2, d3, d4}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{d2, d3, d4}

{d2, d3, d4, d5}

Page 67: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

67

Iteration 4

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{d2, d3, d4}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{d2, d3, d4}

{d2, d3, d4, d5}

Page 68: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

68

Iteration 4

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3, d4, d5}

{d2, d3, d4}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{d2, d3, d4}

{d2, d3, d4, d5}

Page 69: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

69

Iteration 4

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3, d4, d5}

{d2, d3, d4}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

Page 70: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

70

Iteration 5

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4, d5}

{d1}

{d1, d2}

{d1, d2, d3}

d5: x = x + 1{d2, d3, d4}

{d2, d3, d4, d5}

d4: y = t * 4

x < y + z

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

Page 71: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

71

Iteration 6

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4, d5}

{d1}

{d1, d2}

{d1, d2, d3}

d5: x = x + 1{d2, d3, d4, d5}

{d2, d3, d4, d5}

d4: y = t * 4

x < y + z

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

Page 72: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

72

Which expressions are loop invariant

t is defined only in d2 – outside of loop

z is defined only in d3 – outside of loop

y is defined only in d4 – inside of loop but depends on t and 4, both loop-invariant

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1}

{d1, d2}

{d1, d2, d3}

end{d2, d3, d4, d5}

d5: x = x + 1{d2, d3, d4, d5}

{d2, d3, d4, d5}

d4: y = t * 4

x < y + z

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}x is defined only in d5 – inside of loop so is not a loop-invariant

Page 73: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

73

Inferring loop-invariant expressions• For a statement s of the form t = a1 op a2

• A variable ai is immediately loop-invariant if all reaching definitions IN[s]={d1,…,dk} for ai are outside of the loop

• LOOP-INV = immediately loop-invariant variables and constantsLOOP-INV = LOOP-INV {x | d: x = a1 op a2, d is in the loop, and both a1 and a2 are in LOOP-INV}– Iterate until fixed-point

• An expression is loop-invariant if all operands are loop-invariants

Page 74: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

74

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

Page 75: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

75

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

(immediately)LOOP-INV = {t}

Page 76: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

76

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

(immediately)LOOP-INV = {t, z}

Page 77: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

77

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

(immediately)LOOP-INV = {t, z}

Page 78: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

78

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

(immediately)LOOP-INV = {t, z}

Page 79: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

79

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}LOOP-INV = {t, z, 4}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

Page 80: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

80

Computing LOOP-INV

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

LOOP-INV = {t, z, 4, y}

Page 81: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

81

Induction variables

while (i < x) { j = a + 4 * i a[j] = j i = i + 1}

i is incremented by a loop-invariant expression on each iteration – this is called an induction variable

j is a linear function of the induction variable with multiplier 4

Page 82: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

82

Strength-reduction

j = a + 4 * i while (i < x) { j = j + 4 a[j] = j i = i + 1}

Prepare initial value

Increment by multiplier

Page 83: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

83

Summary of optimizations

Enabled Optimizations AnalysisCommon-subexpression eliminationCopy Propagation

Available Expressions

Constant folding Constant PropagationDead code elimination Live VariablesLoop-invariant code motion Reaching Definitions

Page 84: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

84

Global Register Allocation

Page 85: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

85

Page 86: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

86

Page 87: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

87

Registers

• Most machines have a set of registers, dedicated memory locations that– can be accessed quickly,– can have computations performed on them, and– exist in small quantity

• Using registers intelligently is a critical step in any compiler– A good register allocator can generate code orders

of magnitude better than a bad register allocator

Page 88: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

88

Register allocation• In TAC, there are an unlimited number of variables• On a physical machine there are a small number of

registers:– x86 has four general-purpose registers and a number of

specialized registers– MIPS has twenty-four general-purpose registers and

eight special-purpose registers• Register allocation is the process of assigning

variables to registers and managing data transfer in and out of registers

Page 89: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

89

Challenges in register allocation• Registers are scarce

– Often substantially more IR variables than registers– Need to find a way to reuse registers whenever possible

• Registers are complicated– x86: Each register made of several smaller registers; can't use a

register and its constituent registers at the same time– x86: Certain instructions must store their results in specific

registers; can't store values there if you want to use those instructions

– MIPS: Some registers reserved for the assembler or operating system

– Most architectures: Some registers must be preserved across function calls

Page 90: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

90

Simple approach

• Problem: program execution very inefficient–moving data back and forth between memory and registers

x = y + z

mov 16(%ebp), %eaxmov 20(%ebp), %ebxadd %ebx, %eaxmov %eax, 24(%ebx)

• Straightforward solution: • Allocate each variable in activation record• At each instruction, bring values needed into registers,

perform operation, then store result to memory

Page 91: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Find a register allocation

91

b = a + 2

c = b * b

b = c + 1

return b * a

eax

ebx

registerregister variable

? a

? b

? c

Page 92: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Is this a valid allocation?

eax

ebx

register

92

b = a + 2

c = b * b

b = c + 1

return b * a

register variable

eax a

ebx b

eax c

ebx = eax + 2

eax = ebx * ebx

ebx = eax + 1

return ebx * eax

Overwrites previous value of ‘a’ also stored in eax

Page 93: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Is this a valid allocation?

eax

ebx

register

93

b = a + 2

c = b * b

b = c + 1

return b * a

register variable

ebx a

eax b

eax c

eax = ebx + 2

eax = eax * eax

eax = eax + 1

return eax * ebx

Value of ‘c’ stored in eax is not needed anymore so reuse it for ‘b’

Page 94: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Main idea• For every node n in CFG, we have out[n]– Set of temporaries live out of n

• Two variables interfere if they appear in the same out[n] of any node n– Cannot be allocated to the same register

• Conversely, if two variables do not interfere with each other, they can be assigned the same register– We say they have disjoint live ranges

• How to assign registers to variables?

94

Page 95: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Interference graph

• Nodes of the graph = variables• Edges connect variables that interfere with

one another• Nodes will be assigned a color corresponding

to the register assigned to the variable• Two colors can’t be next to one another in the

graph

95

Page 96: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Interference graph construction

b = a + 2

c = b * b

b = c + 1

return b * a

96

Page 97: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Interference graph construction

b = a + 2

c = b * b

b = c + 1{b, a}

return b * a

97

Page 98: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Interference graph construction

b = a + 2

c = b * b{a, c}

b = c + 1{b, a}

return b * a

98

Page 99: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Interference graph construction

b = a + 2{b, a}

c = b * b{a, c}

b = c + 1{b, a}

return b * a

99

Page 100: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Interference graph construction

{a}b = a + 2

{b, a}c = b * b

{a, c}b = c + 1

{b, a}return b * a

100

Page 101: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Interference graph

a

cb

eax

ebx

color register

101

{a}b = a + 2

{b, a}c = b * b

{a, c}b = c + 1

{b, a}return b * a

Page 102: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Colored graph

a

cb

eax

ebx

color register

102

{a}b = a + 2

{b, a}c = b * b

{a, c}b = c + 1

{b, a}return b * a

Page 103: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Graph coloring

• This problem is equivalent to graph-coloring, which is NP-hard if there are at least three registers

• No good polynomial-time algorithms (or even good approximations!) are known for this problem

• We have to be content with a heuristic that is good enough for RIGs that arise in practice

103

Page 104: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

104

Coloring by simplification [Kempe 1879]

• How to find a k-coloring of a graph• Intuition:– Suppose we are trying to k-color a graph and find

a node with fewer than k edges– If we delete this node from the graph and color

what remains, we can find a color for this node if we add it back in

– Reason: fewer than k neighbors some color must be left over

Page 105: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

105

Coloring by simplification [Kempe 1879]

• How to find a k-coloring of a graph• Phase 1: Simplification– Repeatedly simplify graph – When a variable (i.e., graph node) is removed,

push it on a stack• Phase 2: Coloring– Unwind stack and reconstruct the graph as

follows:– Pop variable from the stack– Add it back to the graph– Color the node for that variable with a color that

it doesn’t interfere with

simplify

color

Page 106: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

b

ed

a

c

stack:

eax

ebx

color register

106

Page 107: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

b

ed

a

stack:

c

c

eax

ebx

color register

107

Page 108: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

b

ed

a

stack:

ec

c

eax

ebx

color register

108

Page 109: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

b

ed

a

stack:

aec

c

eax

ebx

color register

109

Page 110: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

b

ed

a

stack:baec

c

eax

ebx

color register

110

Page 111: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

b

ed

a

stack:dbaec

c

eax

ebx

color register

111

Page 112: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

b

ed

eax

ebx

color register

a

stack:

baec

c

112

Page 113: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

b

e

a

stack:

aec

c

eax

ebx

color register

d

113

Page 114: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

e

a

stack:

ec

c

eax

ebx

color register

b

d

114

Page 115: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

e

stack:

c

c

eax

ebx

color register

a

b

d

115

Page 116: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

stack:c

eax

ebx

color register

e

a

b

d

116

Page 117: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

117

Failure of heuristic

• If the graph cannot be colored, it will eventually be simplified to graph in which every node has at least K neighbors

• Sometimes, the graph is still K-colorable!• Finding a K-coloring in all situations is an NP-

complete problem– We will have to approximate to make register

allocators fast enough

Page 118: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

stack:

c

eax

ebx

color register

e

a

b

d

118

Page 119: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

c

eax

ebx

color register

e

a

b

d

stack:cbead

Some graphs can’t be colored in K colors:

119

Page 120: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

c

eax

ebx

color register

e

a

b

d

Some graphs can’t be colored in K colors:

stack:bead

120

Page 121: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

c

eax

ebx

color register

e

a

b

d

Some graphs can’t be colored in K colors:

stack:ead

121

Page 122: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

c

eax

ebx

color register

e

a

b

d

Some graphs can’t be colored in K colors:

stack:ead

no colors left for e!122

Page 123: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

123

Chaitin’s algorithm

• Choose and remove an arbitrary node, marking it “troublesome”– Use heuristics to choose which one– When adding node back in, it may be possible to

find a valid color– Otherwise, we have to spill that node

Page 124: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Spilling• Phase 3: spilling– once all nodes have K or more neighbors, pick a node

for spilling• There are many heuristics that can be used to pick a node• Try to pick node not used much, not in inner loop• Storage in activation record

– Remove it from graph• We can now repeat phases 1-2 without this node• Better approach – rewrite code to spill variable,

recompute liveness information and try to color again

124

Page 125: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

c

eax

ebx

color register

e

a

b

d

Some graphs can’t be colored in K colors:

stack:ead

no colors left for e!125

Page 126: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

c

eax

ebx

color register

e

a

b

d

Some graphs can’t be colored in K colors:

stack:bead

126

Page 127: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

c

eax

ebx

color register

e

a

b

d

Some graphs can’t be colored in K colors:

stack:ead

127

Page 128: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

c

eax

ebx

color register

e

a

b

d

Some graphs can’t be colored in K colors:

stack:ad

128

Page 129: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

c

eax

ebx

color register

e

a

b

d

Some graphs can’t be colored in K colors:

stack:d

129

Page 130: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Coloring k=2

c

eax

ebx

color register

e

a

b

d

Some graphs can’t be colored in K colors:

stack:

130

Page 131: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Handling precolored nodes• Some variables are pre-assigned to registers– Eg: mul on x86/pentium• uses eax; defines eax, edx

– Eg: call on x86/pentium• Defines (trashes) caller-save registers eax, ecx, edx

• To properly allocate registers, treat these register uses as special temporary variables and enter into interference graph as precolored nodes

131

Page 132: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Handling precolored nodes

• Simplify. Never remove a pre-colored node – it already has a color, i.e., it is a given register

• Coloring. Once simplified graph is all colored nodes, add other nodes back in and color them using precolored nodes as starting point

132

Page 133: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

133

Optimizing move instructions• Code generation produces a lot of extra mov instructions

mov t5, t9• If we can assign t5 and t9 to same register, we can get rid

of the mov – effectively, copy elimination at the register allocation level

• Idea: if t5 and t9 are not connected in inference graph, coalesce them into a single variable; the move will be redundant

• Problem: coalescing nodes can make a graphun-colorable– Conservative coalescing heuristic

Page 134: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

134

Summary of material 1/2Techniques Compiler task

• Regular expressions• Finite automata (DFA/NFA)• Determinization via subset construction• Maximal munch and precedences• Automatic scanner generation tools (Jflex)

Scanning

• Context-free grammars• Leftmost/Rightmost-derivations, parse trees• Ambiguity / ambiguity elimination tactics• LL parsing: building prediction tables (FIRST/FOLLOWS), conflicts, left-recursion elimination, recursive descent, automata-based parsing• Shift-reduce parsing: LR items, transition relation construction, conflicts, SLR, LALR, resolving ambiguity via precedence, automatic parser generation tools (CUP)

Parsing

Page 135: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

135

Summary of material 2/2Techniques Compiler task

Three-Address Code and recursive lowering,Sethi-Ullman translation minimizing number of temporaries

Lowering to IR

• Basic blocks, control flow graphs• Local analysis: transfer functions• Local analysis vs. Global analysis• Dataflow analysis: join semilattices, partial orderings, monotone transfer functions• Available expressions, liveness, constant propagation, reaching definitions• Common-subexpression elimination, copy propagation, constant folding, loop-invariant code motion

Optimizations

• Naïve allocation• Register interference graph – isomorphism to graph coloring• Graph coloring by simplification• Chaitin’s algorithm (spilling)

Register allocation

Page 136: Winter  2012-2013 Compiler  Principles Loop Optimizations and Register Allocation

Good luck with final project and exams!

I hope some of this was interesting

Advertisement for next semester course:Program Analysis and Verification