winter 2012-2013 compiler principles loop optimizations and register allocation

Winter 2012-2013Compiler PrinciplesLoop Optimizations

and Register Allocation

Mayer Goldberg and Roman ManevichBen-Gurion University

2

Today• Review (global) dataflow analysis

– Join semilattices• Monotone dataflow frameworks

– Termination– Distribute transfer functions join over all paths

• Loop optimizations– Introduce reaching definitions analysis– Loop code motion– (Strength reduction via induction variables)

• Register allocation by graph coloring– From liveness to register interference graph– Heuristics for graph coloring

3

Liveness Analysis

• A variable is live at a point in a program if later in the program its value will be read before it is written to again

4

Join semilattice definition• A join semilattice is a pair (V, ), where• V is a domain of elements• is a join operator that is– commutative: x y = y x– associative: (x y) z = x (y z)– idempotent: x x = x

• If x y = z, we say that z is the joinor (Least Upper Bound) of x and y

• Every join semilattice has a bottom element denoted such that x = x for all x

5

Partial ordering induced by join

• Every join semilattice (V, ) induces an ordering relationship over its elements

• Define x y iff x y = y• Need to prove– Reflexivity: x x– Antisymmetry: If x y and y x, then x = y– Transitivity: If x y and y z, then x z

6

A join semilattice for liveness• Sets of live variables and the set union operation• Idempotent:– x x = x

• Commutative:– x y = y x

• Associative:– (x y) z = x (y z)

• Bottom element:– The empty set: Ø x = x

• Ordering over elements = subset relation

7

Join semilattice example for liveness

{}

{a} {b} {c}

{a, b} {a, c} {b, c}

{a, b, c}

Bottom element

8

Dataflow framework

• A global analysis is a tuple (D, V, , F, I), where– D is a direction (forward or backward)• The order to visit statements within a basic block,

NOT the order in which to visit the basic blocks– V is a set of values (sometimes called domain)– is a join operator over those values– F is a set of transfer functions fs : V V

(for every statement s)– I is an initial value

9

Running global analyses• Assume that (D, V, , F, I) is a forward analysis• For every statement s maintain values before - IN[s] - and

after - OUT[s]• Set OUT[s] = for all statements s• Set OUT[entry] = I• Repeat until no values change:– For each statement s with predecessors

PRED[s]={p1, p2, … , pn}• Set IN[s] = OUT[p1] OUT[p2] … OUT[pn]• Set OUT[s] = fs(IN[s])

• The order of this iteration does not matter– Chaotic iteration

10

Proving termination

• Our algorithm for running these analyses continuously loops until no changes are detected

• Problem: how do we know the analyses will eventually terminate?

11

A non-terminating analysis

• The following analysis will loop infinitely on any CFG containing a loop:

• Direction: Forward• Domain: ℕ• Join operator: max• Transfer function: f(n) = n + 1• Initial value: 0

12


start

end

x = y

13

Initialization

start

end

x = y0

0

14

Fixed-point iteration

start

end

x = y0

0

15

Choose a block

start

end

x = y0

0

16

Iteration 1

start

end

x = y0

0

0

17

Iteration 1

start

end

x = y1

0

0

18

Choose a block

start

end

x = y1

0

0

19

Iteration 2

start

end

x = y1

0

0

20

Iteration 2

start

end

x = y1

0

1

21

Iteration 2

start

end

x = y2

0

1

22

Choose a block

start

end

x = y2

0

1

23

Iteration 3

start

end

x = y2

0

1

24

Iteration 3

start

end

x = y2

0

2

25

Iteration 3

start

end

x = y3

0

2

26

Why doesn’t this terminate?• Values can increase without bound• Note that “increase” refers to the lattice ordering,

not the ordering on the natural numbers• The height of a semilattice is the length of the

longest increasing sequence in that semilattice• The dataflow framework is not guaranteed to

terminate for semilattices of infinite height• Note that a semilattice can be infinitely large but

have finite height– e.g. constant propagation 0

1

2

3

4

...

27

Height of a lattice• An increasing chain is a sequence of elements a1 a2 … ak

– The length of such a chain is k• The height of a lattice is the length of the maximal

increasing chain• For liveness with n program variables:– {} {v1} {v1,v2} … {v1,…,vn}

• For available expressions it is the number of expressions of the form a=b op c– For n program variables and m operator types:

mn3

28

Another non-terminating analysis

• This analysis works on a finite-height semilattice, but will not terminate on certain CFGs:

• Direction: Forward• Domain: Boolean values true and false• Join operator: Logical OR• Transfer function: Logical NOT• Initial value: false

29


start

end

x = y

30

Initialization

start

end

x = yfalse

false

31

Fixed-point iteration

start

end

x = yfalse

false

32

Choose a block

start

end

x = yfalse

false

33

Iteration 1

start

end

x = yfalse

false

false

34

Iteration 1

start

end

x = ytrue

false

false

35

Iteration 2

start

end

x = ytrue

false

true

36

Iteration 2

start

end

x = yfalse

false

true

37

Iteration 3

start

end

x = yfalse

false

false

38

Iteration 3

start

end

x = ytrue

false

false

39

Why doesn’t it terminate?

• Values can loop indefinitely• Intuitively, the join operator keeps pulling

values up• If the transfer function can keep pushing

values back down again, then the values might cycle forever

false

true

false

true

false

...

40

Why doesn’t it terminate?

• Values can loop indefinitely• Intuitively, the join operator keeps pulling

values up• If the transfer function can keep pushing

values back down again, then the values might cycle forever

• How can we fix this?

false

true

false

true

false

...

41

Monotone transfer functions• A transfer function f is monotone iff

if x y, then f(x) f(y)• Intuitively, if you know less information about a

program point, you can't “gain back” more information about that program point

• Many transfer functions are monotone, including those for liveness and constant propagation

• Note: Monotonicity does not mean that x f(x)– (This is a different property called extensivity)

42

Liveness and monotonicity

• A transfer function f is monotone iff if x y, then f(x) f(y)

• Recall our transfer function for a = b + c is– fa = b + c(V) = (V – {a}) {b, c}

• Recall that our join operator is set union and induces an ordering relationship X Y iff X Y

• Is this monotone?

43

Is constant propagation monotone?

• A transfer function f is monotone iff if x y, then f(x) f(y)

• Recall our transfer functions– fx=k(V) = V|xk (update V by mapping x to k)– fx=a+b(V) = V|xNot-a-Constant (assign Not-a-Constant)

• Is this monotone?

Undefined

0-1-2 1 2 ......

Not-a-constant

44

The grand result

• Theorem: A dataflow analysis with a finite-height semilattice and family of monotone transfer functions always terminates

• Proof sketch:– The join operator can only bring values up– Transfer functions can never lower values back

down below where they were in the past (monotonicity)

– Values cannot increase indefinitely (finite height)

45

An “optimality” result• A transfer function f is distributive if

f(a b) = f(a) f(b)for every domain elements a and b

• If all transfer functions are distributive then the fixed-point solution is the solution that would be computed by joining results from all (potentially infinite) control-flow paths– Join over all paths

• Optimal if we ignore program conditions

46

An “optimality” result• A transfer function f is distributive if

f(a b) = f(a) f(b)for every domain elements a and b

• If all transfer functions are distributive then the fixed-point solution is equal to the solution computed by joining results from all (potentially infinite) control-flow paths– Join over all paths

• Optimal if we pretend all control-flow paths can be executed by the program

• Which analyses use distributive functions?

47

Loop optimizations• Most of a program’s computations are done inside loops– Focus optimizations effort on loops

• The optimizations we’ve seen so far are independent of the control structure

• Some optimizations are specialized to loops– Loop-invariant code motion– (Strength reduction via induction variables)

• Require another type of analysis to find out where expressions get their values from– Reaching definitions

• (Also useful for improving register allocation)

48

Loop invariant computation

y = t * 4x < y + z

endx = x + 1

start

y = …t = …z = …

49

Loop invariant computation

y = t * 4x < y + z

endx = x + 1

start

y = …t = …z = …

t*4 and y+zhave same value on each iteration

50

Code hoisting

x < w

endx = x + 1

start

y = …t = …z = …y = t * 4w = y + z

51

What reasoning did we use?

y = t * 4x < y + z

endx = x + 1

start

y = …t = …z = …

y is defined inside loop but it is loop invariant since t*4 is loop-invariant

Both t and z are defined only outside of loop

constants are trivially loop-invariant

52

What about now?

y = t * 4x < y + z

endx = x + 1t = t + 1

start

y = …t = …z = …

Now t is not loop-invariant and so are t*4 and y

53

Loop-invariant code motion• d: t = a1 op a2

– d is a program location

• a1 op a2 loop-invariant (for a loop L) if computes the same value in each iteration– Hard to know in general

• Conservative approximation– Each ai is a constant, or– All definitions of ai that reach d are outside L, or

– Only one definition of of ai reaches d, and is loop-invariant itself

• Transformation: hoist the loop-invariant code outside of the loop

54

Reaching definitions analysis• A definition d: t = … reaches a program location if there is a path

from the definition to the program location, along which the defined variable is never redefined

55

Reaching definitions analysis• A definition d: t = … reaches a program location if there is a

path from the definition to the program location, along which the defined variable is never redefined

• Direction: Forward• Domain: sets of program locations that are definitions `• Join operator: union• Transfer function:

fd: a=b op c(RD) = (RD - defs(a)) {d} fd: not-a-def(RD) = RD– Where defs(a) is the set of locations defining a (statements of the

form a=...)• Initial value: {}

56

Reaching definitions analysis

d4: y = t * 4

d4:x < y + z

d6: x = x + 1

d1: y = …

d2: t = …

d3: z = …

start

end{}

57

Reaching definitions analysis

d4: y = t * 4

d4:x < y + z

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

end{}

58

Initialization

d4: y = t * 4

d4:x < y + z

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{}

{}

end{}

59

Iteration 1

d4: y = t * 4

d4:x < y + z

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{}

{}

end{}

{}

60

Iteration 1

d4: y = t * 4

d4:x < y + z

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1}

{d1, d2}

{d1, d2, d3}

end{}

{}

{}

61

Iteration 2

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{}

{}

62

Iteration 2

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{}

{}

63

Iteration 2

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{}

{}

64

Iteration 2

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{d2, d3, d4}

{}

65

Iteration 3

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{d2, d3, d4}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{d2, d3, d4}

{}

66

Iteration 3

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{d2, d3, d4}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{d2, d3, d4}

{d2, d3, d4, d5}

67

Iteration 4

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3}

{d2, d3, d4}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{d2, d3, d4}

{d2, d3, d4, d5}

68

Iteration 4

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3, d4, d5}

{d2, d3, d4}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4}

{d2, d3, d4}

{d2, d3, d4, d5}

69

Iteration 4

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3, d4, d5}

{d2, d3, d4}

{}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

70

Iteration 5

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4, d5}

{d1}

{d1, d2}

{d1, d2, d3}

d5: x = x + 1{d2, d3, d4}

{d2, d3, d4, d5}

d4: y = t * 4

x < y + z

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

71

Iteration 6

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4, d5}

{d1}

{d1, d2}

{d1, d2, d3}

d5: x = x + 1{d2, d3, d4, d5}

{d2, d3, d4, d5}

d4: y = t * 4

x < y + z

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

72

Which expressions are loop invariant

t is defined only in d2 – outside of loop

z is defined only in d3 – outside of loop

y is defined only in d4 – inside of loop but depends on t and 4, both loop-invariant

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1}

{d1, d2}

{d1, d2, d3}

end{d2, d3, d4, d5}

d5: x = x + 1{d2, d3, d4, d5}

{d2, d3, d4, d5}

d4: y = t * 4

x < y + z

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}x is defined only in d5 – inside of loop so is not a loop-invariant

73

Inferring loop-invariant expressions• For a statement s of the form t = a1 op a2

• A variable ai is immediately loop-invariant if all reaching definitions IN[s]={d1,…,dk} for ai are outside of the loop

• LOOP-INV = immediately loop-invariant variables and constantsLOOP-INV = LOOP-INV {x | d: x = a1 op a2, d is in the loop, and both a1 and a2 are in LOOP-INV}– Iterate until fixed-point

• An expression is loop-invariant if all operands are loop-invariants

74

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

75

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

(immediately)LOOP-INV = {t}

76

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

(immediately)LOOP-INV = {t, z}

77

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}


78

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}


79

Computing LOOP-INV

end

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}LOOP-INV = {t, z, 4}

d4: y = t * 4

x < y + z

d5: x = x + 1

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

80

Computing LOOP-INV

d4: y = t * 4

x < y + z end

d5: x = x + 1

start

d1: y = …

d2: t = …

d3: z = …

{}

{}

{d1, d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4}

{d1}

{d1, d2}

{d1, d2, d3}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

{d2, d3, d4, d5}

LOOP-INV = {t, z, 4, y}

81

Induction variables

while (i < x) { j = a + 4 * i a[j] = j i = i + 1}

i is incremented by a loop-invariant expression on each iteration – this is called an induction variable

j is a linear function of the induction variable with multiplier 4

82

Strength-reduction

j = a + 4 * i while (i < x) { j = j + 4 a[j] = j i = i + 1}

Prepare initial value

Increment by multiplier

83

Summary of optimizations

Enabled Optimizations AnalysisCommon-subexpression eliminationCopy Propagation

Available Expressions

Constant folding Constant PropagationDead code elimination Live VariablesLoop-invariant code motion Reaching Definitions

84

Global Register Allocation

87

Registers

• Most machines have a set of registers, dedicated memory locations that– can be accessed quickly,– can have computations performed on them, and– exist in small quantity

• Using registers intelligently is a critical step in any compiler– A good register allocator can generate code orders

of magnitude better than a bad register allocator

88

Register allocation• In TAC, there are an unlimited number of variables• On a physical machine there are a small number of

registers:– x86 has four general-purpose registers and a number of

specialized registers– MIPS has twenty-four general-purpose registers and

eight special-purpose registers• Register allocation is the process of assigning

variables to registers and managing data transfer in and out of registers

89

Challenges in register allocation• Registers are scarce

– Often substantially more IR variables than registers– Need to find a way to reuse registers whenever possible

• Registers are complicated– x86: Each register made of several smaller registers; can't use a

register and its constituent registers at the same time– x86: Certain instructions must store their results in specific

registers; can't store values there if you want to use those instructions

– MIPS: Some registers reserved for the assembler or operating system

– Most architectures: Some registers must be preserved across function calls

90

Simple approach

• Problem: program execution very inefficient–moving data back and forth between memory and registers

x = y + z

mov 16(%ebp), %eaxmov 20(%ebp), %ebxadd %ebx, %eaxmov %eax, 24(%ebx)

• Straightforward solution: • Allocate each variable in activation record• At each instruction, bring values needed into registers,

perform operation, then store result to memory

Find a register allocation

91

b = a + 2

c = b * b

b = c + 1

return b * a

eax

ebx

registerregister variable

? a

? b

? c

Is this a valid allocation?

eax

ebx

register

92

b = a + 2

c = b * b

b = c + 1

return b * a

register variable

eax a

ebx b

eax c

ebx = eax + 2

eax = ebx * ebx

ebx = eax + 1

return ebx * eax

Overwrites previous value of ‘a’ also stored in eax

Is this a valid allocation?

eax

ebx

register

93

b = a + 2

c = b * b

b = c + 1

return b * a

register variable

ebx a

eax b

eax c

eax = ebx + 2

eax = eax * eax

eax = eax + 1

return eax * ebx

Value of ‘c’ stored in eax is not needed anymore so reuse it for ‘b’

Main idea• For every node n in CFG, we have out[n]– Set of temporaries live out of n

• Two variables interfere if they appear in the same out[n] of any node n– Cannot be allocated to the same register

• Conversely, if two variables do not interfere with each other, they can be assigned the same register– We say they have disjoint live ranges

• How to assign registers to variables?

94

Interference graph

• Nodes of the graph = variables• Edges connect variables that interfere with

one another• Nodes will be assigned a color corresponding

to the register assigned to the variable• Two colors can’t be next to one another in the

graph

95

Interference graph construction

b = a + 2

c = b * b

b = c + 1

return b * a

96


b = a + 2

c = b * b

b = c + 1{b, a}

return b * a

97


b = a + 2

c = b * b{a, c}

b = c + 1{b, a}

return b * a

98


b = a + 2{b, a}

c = b * b{a, c}

b = c + 1{b, a}

return b * a

99


{a}b = a + 2

{b, a}c = b * b

{a, c}b = c + 1

{b, a}return b * a

100

Interference graph

a

cb

eax

ebx

color register

101

{a}b = a + 2

{b, a}c = b * b

{a, c}b = c + 1

{b, a}return b * a

Colored graph

a

cb

eax

ebx

color register

102

{a}b = a + 2

{b, a}c = b * b

{a, c}b = c + 1

{b, a}return b * a

Graph coloring

• This problem is equivalent to graph-coloring, which is NP-hard if there are at least three registers

• No good polynomial-time algorithms (or even good approximations!) are known for this problem

• We have to be content with a heuristic that is good enough for RIGs that arise in practice

103

104

Coloring by simplification [Kempe 1879]

• How to find a k-coloring of a graph• Intuition:– Suppose we are trying to k-color a graph and find

a node with fewer than k edges– If we delete this node from the graph and color

what remains, we can find a color for this node if we add it back in

– Reason: fewer than k neighbors some color must be left over

105

Coloring by simplification [Kempe 1879]

• How to find a k-coloring of a graph• Phase 1: Simplification– Repeatedly simplify graph – When a variable (i.e., graph node) is removed,

push it on a stack• Phase 2: Coloring– Unwind stack and reconstruct the graph as

follows:– Pop variable from the stack– Add it back to the graph– Color the node for that variable with a color that

it doesn’t interfere with

simplify

color

Coloring k=2

b

ed

a

c

stack:

eax

ebx

color register

106

Coloring k=2

b

ed

a

stack:

c

c

eax

ebx

color register

107

Coloring k=2

b

ed

a

stack:

ec

c

eax

ebx

color register

108

Coloring k=2

b

ed

a

stack:

aec

c

eax

ebx

color register

109

Coloring k=2

b

ed

a

stack:baec

c

eax

ebx

color register

110

Coloring k=2

b

ed

a

stack:dbaec

c

eax

ebx

color register

111

Coloring k=2

b

ed

eax

ebx

color register

a

stack:

baec

c

112

Coloring k=2

b

e

a

stack:

aec

c

eax

ebx

color register

d

113

Coloring k=2

e

a

stack:

ec

c

eax

ebx

color register

b

d

114

Coloring k=2

e

stack:

c

c

eax

ebx

color register

a

b

d

115

Coloring k=2

stack:c

eax

ebx

color register

e

a

b

d

116

117

Failure of heuristic

• If the graph cannot be colored, it will eventually be simplified to graph in which every node has at least K neighbors

• Sometimes, the graph is still K-colorable!• Finding a K-coloring in all situations is an NP-

complete problem– We will have to approximate to make register

allocators fast enough

Coloring k=2

stack:

c

eax

ebx

color register

e

a

b

d

118

Coloring k=2

c

eax

ebx

color register

e

a

b

d

stack:cbead

Some graphs can’t be colored in K colors:

119

Coloring k=2

c

eax

ebx

color register

e

a

b

d


stack:bead

120

Coloring k=2

c

eax

ebx

color register

e

a

b

d


stack:ead

121

Coloring k=2

c

eax

ebx

color register

e

a

b

d


stack:ead

no colors left for e!122

123

Chaitin’s algorithm

• Choose and remove an arbitrary node, marking it “troublesome”– Use heuristics to choose which one– When adding node back in, it may be possible to

find a valid color– Otherwise, we have to spill that node

Spilling• Phase 3: spilling– once all nodes have K or more neighbors, pick a node

for spilling• There are many heuristics that can be used to pick a node• Try to pick node not used much, not in inner loop• Storage in activation record

– Remove it from graph• We can now repeat phases 1-2 without this node• Better approach – rewrite code to spill variable,

recompute liveness information and try to color again

124

Coloring k=2

c

eax

ebx

color register

e

a

b

d


stack:ead

no colors left for e!125

Coloring k=2

c

eax

ebx

color register

e

a

b

d


stack:bead

126

Coloring k=2

c

eax

ebx

color register

e

a

b

d


stack:ead

127

Coloring k=2

c

eax

ebx

color register

e

a

b

d


stack:ad

128

Coloring k=2

c

eax

ebx

color register

e

a

b

d


stack:d

129

Coloring k=2

c

eax

ebx

color register

e

a

b

d


stack:

130

Handling precolored nodes• Some variables are pre-assigned to registers– Eg: mul on x86/pentium• uses eax; defines eax, edx

– Eg: call on x86/pentium• Defines (trashes) caller-save registers eax, ecx, edx

• To properly allocate registers, treat these register uses as special temporary variables and enter into interference graph as precolored nodes

131

Handling precolored nodes

• Simplify. Never remove a pre-colored node – it already has a color, i.e., it is a given register

• Coloring. Once simplified graph is all colored nodes, add other nodes back in and color them using precolored nodes as starting point

132

133

Optimizing move instructions• Code generation produces a lot of extra mov instructions

mov t5, t9• If we can assign t5 and t9 to same register, we can get rid

of the mov – effectively, copy elimination at the register allocation level

• Idea: if t5 and t9 are not connected in inference graph, coalesce them into a single variable; the move will be redundant

• Problem: coalescing nodes can make a graphun-colorable– Conservative coalescing heuristic

134

Summary of material 1/2Techniques Compiler task

• Regular expressions• Finite automata (DFA/NFA)• Determinization via subset construction• Maximal munch and precedences• Automatic scanner generation tools (Jflex)

Scanning

• Context-free grammars• Leftmost/Rightmost-derivations, parse trees• Ambiguity / ambiguity elimination tactics• LL parsing: building prediction tables (FIRST/FOLLOWS), conflicts, left-recursion elimination, recursive descent, automata-based parsing• Shift-reduce parsing: LR items, transition relation construction, conflicts, SLR, LALR, resolving ambiguity via precedence, automatic parser generation tools (CUP)

Parsing

135

Summary of material 2/2Techniques Compiler task

Three-Address Code and recursive lowering,Sethi-Ullman translation minimizing number of temporaries

Lowering to IR

• Basic blocks, control flow graphs• Local analysis: transfer functions• Local analysis vs. Global analysis• Dataflow analysis: join semilattices, partial orderings, monotone transfer functions• Available expressions, liveness, constant propagation, reaching definitions• Common-subexpression elimination, copy propagation, constant folding, loop-invariant code motion

Optimizations

• Naïve allocation• Register interference graph – isomorphism to graph coloring• Graph coloring by simplification• Chaitin’s algorithm (spilling)

Register allocation

Good luck with final project and exams!

I hope some of this was interesting

Advertisement for next semester course:Program Analysis and Verification

http://www.cs.bgu.ac.il/~paver132/Main

winter 2012-2013 compiler principles loop optimizations and register allocation

Documents

x y zidempotent

elementsdefine x y iff

xif x y

y initialization13start

x xantisymmetry

x y zbottom element

block15start end x

block18start end x