data-flow analysis ii cs 671 march 13, 2008. cs 671 – spring 2008 1 data-flow analysis gather...

Data-Flow Analysis II

CS 671March 13, 2008

2 CS 671 – Spring 2008

Data-Flow Analysis

Gather conservative, approximate information about what a program does

Result: some property that holds every time the instruction executes

The Data-Flow Abstraction

Execution of an instruction transforms program state

To analyze a program, we must consider all possible sequences of program points (paths)

Summarize all possible program states with finite set of facts

• Limitation: may consider some infeasible paths

3 CS 671 – Spring 2008

The General Approach

Setting up and solving systems of equations that relate information at various points in the program

such as out[S] = gen[S] ( in[S] - kill[S] ) where– S is a statement– in[S] and out[S] are information before and after

S– gen[S] and kill[S] are information generated and

killed by S

definition of in, out, gen, and kill depends on the desired information

4 CS 671 – Spring 2008

Data-Flow Analysis (cont.)

Properties:• either a forward analysis (out as function of in) or • a backward analysis (in as a function of out).

• either an “along some path” problem or• an “along all paths” problem.

• Data-flow analysis must be conservative

Definitions:• point between two statements (or before the first

statements and after the last)• path is a sequence of consecutive points in the

control-flow graph

5 CS 671 – Spring 2008

Example – Live Variables

Steps:• Set up live sets for each

program point• Instantiate equations• Solve equations

if (c)

x = y+1y = 2*zif (d)

x = y+z

z = 1

z = x

6 CS 671 – Spring 2008

Example

Program points

if (c)

x = y+1y = 2*zif (d)

x = y+z

z = 1

z = x

L1

L5

L9

L2

L6

L3

L11

L4

L10

L7

L8

L12

7 CS 671 – Spring 2008

Example

if (c)

x = y+1y = 2*zif (d)

x = y+z

z = 1

z = x

L1

L5

L9

L2

L6

L3

L11

L4

L10

L7

L8

L12

1

2

3

4

5

6

7

Stmt Defs Uses

1

2

3

4

5

6

7

8 CS 671 – Spring 2008

Example

if (c)

x = y+1y = 2*zif (d)

x = y+z

z = 1

z = x

1

2

3

4

5

6

7

L1 =

L2 =

L3 =

L4 =

L5 =

L6 =

L7 =

L8 =

L9 =

L10 =

L11 =

L12 =

in[I] = ( out[I] – def[I] ) use[I]

out[B] = in[B’]B’ succ(B)

L1 = { }

L5 = { }

L9 = { }

L2 = { }

L6 = { }

L3 = { }

L11 = { }

L4 = { }

L10 = { }

L7 = { }

L8 = { }

L12 = { }

9 CS 671 – Spring 2008

More Terminology

Successors

Succ(B1) =

Succ(B2) =

Succ(B3) =

Predecessors

Pred(B2) =

Pred(B3) =

Pred(B4) =

B1

B2 B3

B4

Branch node – more than one successor

Join node – more than one predecessor

10 CS 671 – Spring 2008

Dominators

Dominance is a binary relation on the flow graph nodes that allows us to easily find loops

Node d dominates node i (d dom i) if every possible execution path from entry to i includes d

Dominance is:• Reflexive – every node dominates itself• Transitive – if a dom b and b dom c, then a dom c• Antisymmetric – if a dom b and b dom a then a=b

entry

B1

B2 B3

B4

B6

B5

exit

dom(entry) = dom(b1) =dom(b2) =dom(b3) =dom(b4) = dom(b5) = dom(b6) = dom(exit) =

11 CS 671 – Spring 2008

Immediate dominators

Idom(b) – a iff (a b) and (a dom b) and there does not exist a node c such that (a dom c) and (c dom b) with c different than a and b

•Idom of a node is unique

•Idom relationship forms a tree whose root is the entry node

idom(b1) =idom(b2) =idom(b3) =idom(b4) = idom(b5) = idom(b6) = idom(exit) =

entry

B1

B2 B3

B4

B6

B5

exit

Flow graph

12 CS 671 – Spring 2008

Strict Dominators and Postdominators

(d sdom i) if d dominates i and d i

(p pdom i) if every possible execution path from i to exit includes p

entry

B1

B2 B3

B4

B6

B5

exit

Flow graph

pdom(entry) = pdom(b1) =pdom(b2) =pdom(b3) =pdom(b4) = pdom(b5) = pdom(b6) =

13 CS 671 – Spring 2008

Loops

Back edge – edge whose head dominates its tail

Loop containing this type of back edge is a natural loop• i.e. it has a single external entry point

For back edge b c the loop header is c

entry

B1

B2

B3

exit

Natural loops = Loop header (B3 B1) = Loop header (B2 B2) =

14 CS 671 – Spring 2008

Quicksort Example

How might we optimize this code?

i := m-1j := nt1 := 4*nv := a[t1]

i := i+1t2 := 4*it3 := a[t2]if t3 < v goto b2

j := j-1t4 := 4*jt5 := a[t4]if t5 > v goto b3

if i >= j goto b6

t6 :=4*ix := a[t6]t7 := 4*it8 := 4*jt9 := a[t8]a[t7] :=t9t10 := 4*ja[t10] := x

t11 := 4*ix := a[t11]t12 := 4*it13 := 4*nt14 := a[t13]a[t12] := t14t15 := 4*na[t15] := x

b1

b2

b3

b4

b6b5

[Quicksort] (i, j, v, x variables are needed outside)

15 CS 671 – Spring 2008

Reaching Definitions

Informally:• determine if a particular definition (e.g. “x” in “x = 5”) may

reach a given point in the program

Why reaching definitions may be useful:

x := 5

y := x + 2

if “x := 5” is the only definition reaching “y := x+2”, it can be simplified to “y := 7”(constant propagation)

16 CS 671 – Spring 2008


Definition of a variable X:• is a statements that assigns (or may assign) a value

to X• unambiguous: X := 3• ambiguous: foo(X) or *Y := 3

A definition d reaches a point p :• if there is a path from the point immediately

following d to p, • such that d is not killed along that path.

A definition d of variable X is killed along path p• if there is another definition of X along p.

17 CS 671 – Spring 2008

Reaching Definitions (cont.)

Has the following properties:• forward analysis• “along some path” problem

Is conservative in that:• definition d may not define variable X • along a path p, there is another definition of X, but

this other definition is ambiguous• definition d may be killed along infeasible paths

18 CS 671 – Spring 2008

Data-Flow Analysis: Structured Programs

Most programs are structured:• sequence of statements• if-then-else construct• while-loops (including for-loops, loops with breaks,...)

For these programs, we may use an inductive (syntax driven) approach:

1

2 3

1

2-3

1-2-3 1-2-3

19 CS 671 – Spring 2008

Reaching Definitions for Structured Programs

S

gen[S] = gen[S2] ( gen[S1] - kill[S2] )kill[S] = kill[S2] ( kill[S1] - gen[S2] )

in[S1] = in[S]in[S2] = out[S1]out[S] = out[S2]

S1

S2

S d: a=b+c

gen[S] = {d}kill[S] = All-defs-of-a - {d}

out[S] = gen[S] ( in[S] - kill[S] )

20 CS 671 – Spring 2008

Reaching Definitions for Structured Programs (cont.)

S

gen[S] = gen[S1] gen[S2] kill[S] = kill[S1] kill[S2]

in[S1] = in[S2] = in[S]out[S] = out[S1] out[S2]

S1 S2

S

gen[S] = gen[S1] kill[S] = kill[S1]

in[S1] = in[S] gen[S1]out[S] = out[S1]

S1

21 CS 671 – Spring 2008

Iterative Solution: Data-Flow Equations

Inductive approach only applicable to structured programs• because utilizes the structure of the program to

synthesize & distribute the data-flow information

Need a general technique: Iterative Approach• compute the gen/kill sets of each statement / basic

block• initialize the in/out sets• repetitively compute out/in sets until a steady state

is reached

22 CS 671 – Spring 2008


Reaching definitions:• set of definitions that may reach (along one or more

paths) a given point• gen[S]: definition d is in gen[S] if d may reach the

end of S, independently of whether it reaches the beginning of S.

• kill[S]: the set of definitions that never reach the end of S, even if they reach the beginning.

Equations:• in[S] = (P a predecessor of S) out[P ] • out[S] = gen[S] ( in[S] - kill[S] )

23 CS 671 – Spring 2008

Reaching Definitions (cont.)

Algorithm:for each basic block B: out[B] := gen[B]; (1)do

change := false;for each basic block B do

in[B] = (P a predecessor of B) out[P ]; (2)old-out = out[B]; (3)out[B] = gen[B] (in[B] - kill[B]); (4)if (out[B] != old-out) then change := true; (5)

endwhile change

24 CS 671 – Spring 2008

Example for Reaching Definitions

i := m-1 d1j := n d2a := u1 d3

i := i+1 d4j := j-1 d5

b1

b2

a := u2 d6 b3

i := u3 d7 b4

initialin[B]000 0000000 0000000 0000000 0000

out[B]000 0000000 0000000 0000000 0000

b1b2b3b4

pass1in[B]000 0000000 0000000 0000000 0000

out[B]000 0000000 0000000 0000000 0000

pass2in[B]000 0000000 0000000 0000000 0000

out[B]000 0000000 0000000 0000000 0000

gen[b1] := {d1, d2, d3}kill[b1] := {d4, d5, d6, d7}gen[b2] := {}kill[b2] := {}gen[b3] := {}kill[b3] := {}gen[b4] := {}kill[b4] := {}

pass3in[B]000 0000000 0000000 0000000 0000

out[B]000 0000000 0000000 0000000 0000

Compute gen/kill and iterate (visiting order: b1, b2, b3, b4)

25 CS 671 – Spring 2008

Generalizations: Other Data-Flow Analyses

Reaching definitions is a (forward; some-path) analysis

For backward analysis:• interchange in / out sets in the previous algorithm,

lines (1-5)

For all-path analysis:• intersection is substituted for union in line (2)

26 CS 671 – Spring 2008

Common Subexpression Elimination

Rule used to eliminate subexpression within a basic block• The subexpression was already defined• The value of the subexpression is not modified

– i.e. none of the values needed to compute the subexpression are redefined

What about eliminating subexpressions across basic blocks?

27 CS 671 – Spring 2008

Available Expressions

An expression x+y is available at a point p:• if every path from the initial node to p evaluates x+y,

and• after the last such evaluation, prior to reaching p, there

are no subsequent assignments to x or y.

Definitions: • forward, all-path,• e-gen[S]: expressions definitely generated by S,

– e.g. “z := x+y”: expression “x+y” is generated• e-kill[S]: expressions that may be killed by S

– e.g. “z := x+y”: all expression containing “z” are killed.

• order: compute e-gen and then e-kill, e.g. “x:= x+y”

28 CS 671 – Spring 2008

Available Expressions (cont.)

Algorithm:for each basic block B: out[B] := e-gen[B]; (1)do

change := false;for each basic block B do

in[B] = (P a predecessor of B) out[P]; (2)old-out = out[B]; (3)out[B] = e-gen[B] (in[B] - e-kill[B]); (4)if (out[B] != old-out) then change := true; (5)

end while change

difference: line (2), use intersection instead of union

29 CS 671 – Spring 2008

Pointer Analysis

Identify the memory locations that may be addressed by a pointer• may be formalized as a system of data-flow equations.

Simple programming model: • pointer to integer (or float, arrays of integer, arrays of float)• no pointer to pointers allowed

Definitions:• in[S]: the set of pairs (p, a), where p is a pointer, a is a

variables, and p might point to a before statement S.• out[S]: the set of pairs (p, a), where p might point to a after

statement S.

• gen[S]: the new pairs (p, a) generated by the statement S.• kill[S]: the pairs (p, a) killed by the statement S.

30 CS 671 – Spring 2008

Pointer Analysis (cont.)

S: a=b+cgen [S ] = { }kill[S ] = { }

S: p = &agen [S ] = { (p, a) }kill[S, input set ] = { (p, b)

| (p, b) is in input set }

S: p = qgen [S, input set ] = { (p, b)

| (q, b) is in input set }kill[S, input set ] = { (p, b)

| (p, b) is in input set }

input set

input set

input set

31 CS 671 – Spring 2008

Pointer Analysis (cont.)

Algorithm:

for each basic block B: out[B] := gen []; (1)

dochange := false;for each basic block B do

in[B] = (P a predecessor of B) out[P]; (2)old-out = out[B]; (3)out[B] = gen[B, in[B] ] in[B] - kill[B, in[B] ] ) (4)if (out[B] != old-out) then change := true; (5)

endwhile change

difference: line (4): gen and kill are functions of B and in[B].

32 CS 671 – Spring 2008

Performance of Iterative Solutions

Global analysis may be memory-space / computing intensive

May be reduced by • using bitvector representations for sets• analyzing only relevant variables

– e.g. temporary variables may be ignored• synthesizing data-flow within basic block• mixing inductive and iterative solutions• suitably ordering the basic block

– e.g. depth first order is good for forward analysis • limiting scope

– may reduce the precision of analysis

33 CS 671 – Spring 2008

Summary

Iterative algorithm:• solve data-flow problem for arbitrary control flow

graph

To solve a new data-flow problem:• define gen/kill accordingly• determine properties:

– forward / backward– some-path / all-path

data-flow analysis ii cs 671 march 13, 2008. cs 671 – spring 2008 1 data-flow analysis gather...

Documents

x slide

b dom c

c dom b

node c

dom c antisymmetric

predecessor slide

dataflow analysis

node i d dom i