data-flow analysis ii cs 671 march 13, 2008. cs 671 – spring 2008 1 data-flow analysis gather...
TRANSCRIPT
Data-Flow Analysis II
CS 671March 13, 2008
2 CS 671 – Spring 2008
Data-Flow Analysis
Gather conservative, approximate information about what a program does
Result: some property that holds every time the instruction executes
The Data-Flow Abstraction
Execution of an instruction transforms program state
To analyze a program, we must consider all possible sequences of program points (paths)
Summarize all possible program states with finite set of facts
• Limitation: may consider some infeasible paths
3 CS 671 – Spring 2008
The General Approach
Setting up and solving systems of equations that relate information at various points in the program
such as out[S] = gen[S] ( in[S] - kill[S] ) where– S is a statement– in[S] and out[S] are information before and after
S– gen[S] and kill[S] are information generated and
killed by S
definition of in, out, gen, and kill depends on the desired information
4 CS 671 – Spring 2008
Data-Flow Analysis (cont.)
Properties:• either a forward analysis (out as function of in) or • a backward analysis (in as a function of out).
• either an “along some path” problem or• an “along all paths” problem.
• Data-flow analysis must be conservative
Definitions:• point between two statements (or before the first
statements and after the last)• path is a sequence of consecutive points in the
control-flow graph
5 CS 671 – Spring 2008
Example – Live Variables
Steps:• Set up live sets for each
program point• Instantiate equations• Solve equations
if (c)
x = y+1y = 2*zif (d)
x = y+z
z = 1
z = x
6 CS 671 – Spring 2008
Example
Program points
if (c)
x = y+1y = 2*zif (d)
x = y+z
z = 1
z = x
L1
L5
L9
L2
L6
L3
L11
L4
L10
L7
L8
L12
7 CS 671 – Spring 2008
Example
if (c)
x = y+1y = 2*zif (d)
x = y+z
z = 1
z = x
L1
L5
L9
L2
L6
L3
L11
L4
L10
L7
L8
L12
1
2
3
4
5
6
7
Stmt Defs Uses
1
2
3
4
5
6
7
8 CS 671 – Spring 2008
Example
if (c)
x = y+1y = 2*zif (d)
x = y+z
z = 1
z = x
1
2
3
4
5
6
7
L1 =
L2 =
L3 =
L4 =
L5 =
L6 =
L7 =
L8 =
L9 =
L10 =
L11 =
L12 =
in[I] = ( out[I] – def[I] ) use[I]
out[B] = in[B’]B’ succ(B)
L1 = { }
L5 = { }
L9 = { }
L2 = { }
L6 = { }
L3 = { }
L11 = { }
L4 = { }
L10 = { }
L7 = { }
L8 = { }
L12 = { }
9 CS 671 – Spring 2008
More Terminology
Successors
Succ(B1) =
Succ(B2) =
Succ(B3) =
Predecessors
Pred(B2) =
Pred(B3) =
Pred(B4) =
B1
B2 B3
B4
Branch node – more than one successor
Join node – more than one predecessor
10 CS 671 – Spring 2008
Dominators
Dominance is a binary relation on the flow graph nodes that allows us to easily find loops
Node d dominates node i (d dom i) if every possible execution path from entry to i includes d
Dominance is:• Reflexive – every node dominates itself• Transitive – if a dom b and b dom c, then a dom c• Antisymmetric – if a dom b and b dom a then a=b
entry
B1
B2 B3
B4
B6
B5
exit
dom(entry) = dom(b1) =dom(b2) =dom(b3) =dom(b4) = dom(b5) = dom(b6) = dom(exit) =
11 CS 671 – Spring 2008
Immediate dominators
Idom(b) – a iff (a b) and (a dom b) and there does not exist a node c such that (a dom c) and (c dom b) with c different than a and b
•Idom of a node is unique
•Idom relationship forms a tree whose root is the entry node
idom(b1) =idom(b2) =idom(b3) =idom(b4) = idom(b5) = idom(b6) = idom(exit) =
entry
B1
B2 B3
B4
B6
B5
exit
Flow graph
12 CS 671 – Spring 2008
Strict Dominators and Postdominators
(d sdom i) if d dominates i and d i
(p pdom i) if every possible execution path from i to exit includes p
entry
B1
B2 B3
B4
B6
B5
exit
Flow graph
pdom(entry) = pdom(b1) =pdom(b2) =pdom(b3) =pdom(b4) = pdom(b5) = pdom(b6) =
13 CS 671 – Spring 2008
Loops
Back edge – edge whose head dominates its tail
Loop containing this type of back edge is a natural loop• i.e. it has a single external entry point
For back edge b c the loop header is c
entry
B1
B2
B3
exit
Natural loops = Loop header (B3 B1) = Loop header (B2 B2) =
14 CS 671 – Spring 2008
Quicksort Example
How might we optimize this code?
i := m-1j := nt1 := 4*nv := a[t1]
i := i+1t2 := 4*it3 := a[t2]if t3 < v goto b2
j := j-1t4 := 4*jt5 := a[t4]if t5 > v goto b3
if i >= j goto b6
t6 :=4*ix := a[t6]t7 := 4*it8 := 4*jt9 := a[t8]a[t7] :=t9t10 := 4*ja[t10] := x
t11 := 4*ix := a[t11]t12 := 4*it13 := 4*nt14 := a[t13]a[t12] := t14t15 := 4*na[t15] := x
b1
b2
b3
b4
b6b5
[Quicksort] (i, j, v, x variables are needed outside)
15 CS 671 – Spring 2008
Reaching Definitions
Informally:• determine if a particular definition (e.g. “x” in “x = 5”) may
reach a given point in the program
Why reaching definitions may be useful:
x := 5
y := x + 2
if “x := 5” is the only definition reaching “y := x+2”, it can be simplified to “y := 7”(constant propagation)
16 CS 671 – Spring 2008
Reaching Definitions
Definition of a variable X:• is a statements that assigns (or may assign) a value
to X• unambiguous: X := 3• ambiguous: foo(X) or *Y := 3
A definition d reaches a point p :• if there is a path from the point immediately
following d to p, • such that d is not killed along that path.
A definition d of variable X is killed along path p• if there is another definition of X along p.
17 CS 671 – Spring 2008
Reaching Definitions (cont.)
Has the following properties:• forward analysis• “along some path” problem
Is conservative in that:• definition d may not define variable X • along a path p, there is another definition of X, but
this other definition is ambiguous• definition d may be killed along infeasible paths
18 CS 671 – Spring 2008
Data-Flow Analysis: Structured Programs
Most programs are structured:• sequence of statements• if-then-else construct• while-loops (including for-loops, loops with breaks,...)
For these programs, we may use an inductive (syntax driven) approach:
1
2 3
1
2-3
1-2-3 1-2-3
19 CS 671 – Spring 2008
Reaching Definitions for Structured Programs
S
gen[S] = gen[S2] ( gen[S1] - kill[S2] )kill[S] = kill[S2] ( kill[S1] - gen[S2] )
in[S1] = in[S]in[S2] = out[S1]out[S] = out[S2]
S1
S2
S d: a=b+c
gen[S] = {d}kill[S] = All-defs-of-a - {d}
out[S] = gen[S] ( in[S] - kill[S] )
20 CS 671 – Spring 2008
Reaching Definitions for Structured Programs (cont.)
S
gen[S] = gen[S1] gen[S2] kill[S] = kill[S1] kill[S2]
in[S1] = in[S2] = in[S]out[S] = out[S1] out[S2]
S1 S2
S
gen[S] = gen[S1] kill[S] = kill[S1]
in[S1] = in[S] gen[S1]out[S] = out[S1]
S1
21 CS 671 – Spring 2008
Iterative Solution: Data-Flow Equations
Inductive approach only applicable to structured programs• because utilizes the structure of the program to
synthesize & distribute the data-flow information
Need a general technique: Iterative Approach• compute the gen/kill sets of each statement / basic
block• initialize the in/out sets• repetitively compute out/in sets until a steady state
is reached
22 CS 671 – Spring 2008
Reaching Definitions
Reaching definitions:• set of definitions that may reach (along one or more
paths) a given point• gen[S]: definition d is in gen[S] if d may reach the
end of S, independently of whether it reaches the beginning of S.
• kill[S]: the set of definitions that never reach the end of S, even if they reach the beginning.
Equations:• in[S] = (P a predecessor of S) out[P ] • out[S] = gen[S] ( in[S] - kill[S] )
23 CS 671 – Spring 2008
Reaching Definitions (cont.)
Algorithm:for each basic block B: out[B] := gen[B]; (1)do
change := false;for each basic block B do
in[B] = (P a predecessor of B) out[P ]; (2)old-out = out[B]; (3)out[B] = gen[B] (in[B] - kill[B]); (4)if (out[B] != old-out) then change := true; (5)
endwhile change
24 CS 671 – Spring 2008
Example for Reaching Definitions
i := m-1 d1j := n d2a := u1 d3
i := i+1 d4j := j-1 d5
b1
b2
a := u2 d6 b3
i := u3 d7 b4
initialin[B]000 0000000 0000000 0000000 0000
out[B]000 0000000 0000000 0000000 0000
b1b2b3b4
pass1in[B]000 0000000 0000000 0000000 0000
out[B]000 0000000 0000000 0000000 0000
pass2in[B]000 0000000 0000000 0000000 0000
out[B]000 0000000 0000000 0000000 0000
gen[b1] := {d1, d2, d3}kill[b1] := {d4, d5, d6, d7}gen[b2] := {}kill[b2] := {}gen[b3] := {}kill[b3] := {}gen[b4] := {}kill[b4] := {}
pass3in[B]000 0000000 0000000 0000000 0000
out[B]000 0000000 0000000 0000000 0000
Compute gen/kill and iterate (visiting order: b1, b2, b3, b4)
25 CS 671 – Spring 2008
Generalizations: Other Data-Flow Analyses
Reaching definitions is a (forward; some-path) analysis
For backward analysis:• interchange in / out sets in the previous algorithm,
lines (1-5)
For all-path analysis:• intersection is substituted for union in line (2)
26 CS 671 – Spring 2008
Common Subexpression Elimination
Rule used to eliminate subexpression within a basic block• The subexpression was already defined• The value of the subexpression is not modified
– i.e. none of the values needed to compute the subexpression are redefined
What about eliminating subexpressions across basic blocks?
27 CS 671 – Spring 2008
Available Expressions
An expression x+y is available at a point p:• if every path from the initial node to p evaluates x+y,
and• after the last such evaluation, prior to reaching p, there
are no subsequent assignments to x or y.
Definitions: • forward, all-path,• e-gen[S]: expressions definitely generated by S,
– e.g. “z := x+y”: expression “x+y” is generated• e-kill[S]: expressions that may be killed by S
– e.g. “z := x+y”: all expression containing “z” are killed.
• order: compute e-gen and then e-kill, e.g. “x:= x+y”
28 CS 671 – Spring 2008
Available Expressions (cont.)
Algorithm:for each basic block B: out[B] := e-gen[B]; (1)do
change := false;for each basic block B do
in[B] = (P a predecessor of B) out[P]; (2)old-out = out[B]; (3)out[B] = e-gen[B] (in[B] - e-kill[B]); (4)if (out[B] != old-out) then change := true; (5)
end while change
difference: line (2), use intersection instead of union
29 CS 671 – Spring 2008
Pointer Analysis
Identify the memory locations that may be addressed by a pointer• may be formalized as a system of data-flow equations.
Simple programming model: • pointer to integer (or float, arrays of integer, arrays of float)• no pointer to pointers allowed
Definitions:• in[S]: the set of pairs (p, a), where p is a pointer, a is a
variables, and p might point to a before statement S.• out[S]: the set of pairs (p, a), where p might point to a after
statement S.
• gen[S]: the new pairs (p, a) generated by the statement S.• kill[S]: the pairs (p, a) killed by the statement S.
30 CS 671 – Spring 2008
Pointer Analysis (cont.)
S: a=b+cgen [S ] = { }kill[S ] = { }
S: p = &agen [S ] = { (p, a) }kill[S, input set ] = { (p, b)
| (p, b) is in input set }
S: p = qgen [S, input set ] = { (p, b)
| (q, b) is in input set }kill[S, input set ] = { (p, b)
| (p, b) is in input set }
input set
input set
input set
31 CS 671 – Spring 2008
Pointer Analysis (cont.)
Algorithm:
for each basic block B: out[B] := gen []; (1)
dochange := false;for each basic block B do
in[B] = (P a predecessor of B) out[P]; (2)old-out = out[B]; (3)out[B] = gen[B, in[B] ] in[B] - kill[B, in[B] ] ) (4)if (out[B] != old-out) then change := true; (5)
endwhile change
difference: line (4): gen and kill are functions of B and in[B].
32 CS 671 – Spring 2008
Performance of Iterative Solutions
Global analysis may be memory-space / computing intensive
May be reduced by • using bitvector representations for sets• analyzing only relevant variables
– e.g. temporary variables may be ignored• synthesizing data-flow within basic block• mixing inductive and iterative solutions• suitably ordering the basic block
– e.g. depth first order is good for forward analysis • limiting scope
– may reduce the precision of analysis
33 CS 671 – Spring 2008
Summary
Iterative algorithm:• solve data-flow problem for arbitrary control flow
graph
To solve a new data-flow problem:• define gen/kill accordingly• determine properties:
– forward / backward– some-path / all-path