proliferation of data-flow problems copyright 2011, keith d. cooper & linda torczon, all rights...
TRANSCRIPT
Proliferation of Data-flow Problems
Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use.
Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved.
Comp 512Spring 2011
Data-flow Basics: Review from Last Lecture
• Graph, sets, equations, & initial values define a problem Sets form a semilattice Equations treated as transfer functions Properties of semilattice & transfer functions determine
termination, and correctness
• Dominator calculation One of the simplest forward data-flow problems
• Liveness information A typical (backwards) global data-flow problem
• Both DOM and LIVE are well behaved Admissible frameworks with unique fixed points
• Postorder, rPostorder, Preorder, & rPreorder
COMP 512, Rice University
2
Today’s Lecture
• Framework properties and speed of convergence Kam-Ullman rapid condition Worklist iterative algorithm
• More data-flow problems Available expressions Very busy expressions Constant propagation (Kildall style, not
SCCP)
• Problems that don’t fit the Kam-Ullman model for speed
COMP 512, Rice University
3
COMP 512, Rice University
4
Iterative Data-flow Analysis
Correctness
• If every fn F is monotone, i.e., f(xy) ≤ f(x) f(y), and
• If the lattice is bounded, i.e., every descending chain is finite Chain is sequence x1, x2, …, xn where xi L, 1 ≤ i ≤ n
xi > xi+1, 1 ≤ i < n chain is descending
Then
• The round-robin algorithm computes a least fixed-point (LFP)
• The uniqueness of the solution depends on other properties of F
• Unique solution it finds the one we want
• Multiple solutions we need to know which one it finds
Reference material
COMP 512, Rice University
5
Iterative Data-flow Analysis
Correctness
• Does the iterative algorithm compute the desired answer?
Admissible Function Spaces
1. f F, x,y L, f (x y) = f (x) f (y)
2. fi F such that x L, fi(x) = x
3. f,g F h F such that h(x ) = f (g(x))
4. x L, a finite subset H F such that x = f H f ()
If F meets these four conditions, then an instance of the problem will have a unique fixed point solution (instance graph + initial values)
LFP = MFP = MOP order of evaluation does not matter
*
Both DOM & LIVE meet all four criteria
If meet does not distribute over function application, then the fixed point solution may not be unique. The iterative algorithm will find a LFP.
Reference material
COMP 512, Rice University
6
Iterative Data-flow Analysis
Speed
• For a problem with an admissible function space & a bounded semilattice,
• If the functions all meet the rapid condition, i.e.,
f,g F, x L, f (g()) ≥ g() f (x) x
then, a round-robin, reverse-postorder iterative algorithm will halt in d(G)+3 passes over a graph G
d(G) is the loop-connectedness of the graph w.r.t a DFST Maximal number of back edges in an acyclic path Several studies suggest that, in practice, d(G) is small
(<3) For most CFGs, d(G) is independent of the specific DFST
Sets stabilize in two passes around a
loop
Each pass does O(E ) meets & O(N ) other operations
*
Both DOM & LIVE are rapid frameworks
COMP 512, Rice University
7
Iterative Data-flow analysis
What does this mean?
• Reverse postorder Easily computed order that increases propagation per pass
• Round-robin iterative algorithm Visit all the nodes in a consistent order (RPO) Do it again until the sets stop changing
• Rapid condition Most classic global data-flow problems meet this condition
These conditions are easily met Admissible framework, rapid function space Round-robin, reverse-postorder, iterative algorithm
The analysis runs in (effectively) linear time
COMP 512, Rice University
8
The Round-robin Iterative Algorithm
LIVEOUT(nf ) Ø
for all other nodes nLIVEIN(n) UEVAR(n)
change true
while (change)
change false for i 0 to N
n rPreorder(i)TEMP ssucc(n) LIVEIN(s)
if LIVEOUT(n) ≠ TEMP then change true LIVEOUT(n ) TEMP LIVEIN(n) UEVAR(x) (LIVEOUT(x) VARKILL(x))
Meet operation
We could, for brevity, fold the transfer function into the computation of LIVEOUTand eliminate the need to represent LIVEIN.
We use this formulation for clarity.
Transfer function
COMP 512, Rice University
9
The Worklist Iterative Algorithm
Worklist { all nodes, ni }
LIVEOUT(nf) Ø
for all other nodes nLIVEIN(n) UEVAR(n)
while (Worklist Ø)
remove a node n from Worklist
TEMP ssucc(n) LIVEIN(s)
if LIVEOUT(n) ≠ TEMP then
LIVEOUT(ni ) TEMP
LIVEIN(ni) UEVAR(x) (LIVEOUT(x) VARKILL(x))
Worklist Worklist preds(n)
This algorithm avoids re-evaluating LIVEOUT when none of its constituents has changed. Thus, it does fewer set operations on LIVEOUT & LIVEIN than the round-robin algorithm. It does manipulate the Worklist, which is an added cost.
Implement the Worklist as a set – no duplicate entries
COMP 512, Rice University
10
Round-robin Versus Worklist Algorithm
Termination, correctness, & complexity arguments are for the round-robin version
• The worklist algorithm does less work; it visits fewer nodes
• Consider a version with two worklists: current & next Draw nodes from current in RPO; add them to next Swap the worklists when current becomes empty (1 r-r pass) Makes same changes, in same order, as round robin Termination & correctness arguments should carry over
• Is this the best order? Maybe not … Iterate around a loop until it stabilizes Many data structures for worklist, many orders Your mileage will vary (measurable
effect )
Worklist algorithm is a sparse version of round-robin algorithm
COMP 512, Rice University
11
Iterative Data-flow Analysis
How do we use these results?
Prove that data-flow framework is admissible & rapid Its just algebra Most (but not all) global data-flow problems are rapid This is a property of F
Code up an iterative algorithm World’s simplest data-flow algorithm
• Rely on the results Theoretically sound, robust algorithm
This lets us ignore most of the other data-flow algorithms in 512
COMP 512, Rice University
12
Live Variables
Transformation: Eliminating unneeded stores
• e in a register, have seen last definition, never again used
• The store is dead (except for debugging)
• Compiler can eliminate the store
Data-flow problem: Live variables
LIVEOUT(b) = s succ(b) (UEVAR(s) (LIVE(s) VARKILL(s)))
LIVEOUT(nf) = Ø
• LIVE(b) is the set of variables live on exit from b
• VARKILL(b) is the set of variables that are redefined in b
• UEVAR(b) is the set of variables used before any redefinition in b
Live analysis is a backward flow problem
|LIVEOUT| = |variables|
LIVE plays an important role in both register allocation and the pruned-SSA construction.
Form of f is same as in AVAIL
*
COMP 512, Rice University
13
Computing Available Expressions
An expression is available on entry to b if, along every path from the entry node to b, it has been computed & none of its constituent subexpressions has been subsequently modified.
AVAIL(b) = xpred(b) (DEEXPR(x) (AVAIL(x) EXPRKILL(x)))
where
• EXPRKILL(b) is the set of expression killed in b, and
• DEEXPR(b) is the set of expressions defined in b and not subsequently killed in b
Initial conditionsAVAIL(n0) = Ø, because nothing is computed before n0
AVAIL(n) = { all expressions }, because ∧is ∩
Available expressions is a forward data-flow problem
|AVAIL| = |expressions|
Using Available Expressions
AVAIL is the basis for global common subexpression elimination
• If x+y ∈ AVAIL(b) then the compiler can replace any upwards exposed occurrence of x+y in b with a reference to the previously computed value
• Standard caveats occur about preserving prior computations Hash x+y and assign temporary names; copy all x+y to
name Walk backward through code to find minimal set of copies Other clever schemes abount
• Resulting algorithm is similar to value numbering Based on textual identity of expressions, not value identity Recognizes redundancy carried along loop-closing branch
• The classic form of GCSE (Cocke, SIGPLAN Conference, 1970)
COMP 512, Rice University
14
COMP 512, Rice University
15
Very Busy Expressions
An expression e is very busy on exit from n, iff e is evaluated & used
along every path from n to nf AND evaluating e at the end of n would
produce the same result as the next evaluation along those paths
The Analysis
• Annotate each block n with a set VERYBUSY(n) that contains each expression
• Perform analysis to find opportunities
• Problem is a backward problem
Initialization(s)
VERYBUSY(nf ) = Ø
VERYBUSY(n) = ?
Recurrence Equation
VERYBUSY(n) = ?
…x+y…
…x+y…
…
…
Hint: Use UEEXPR(n) and EXPRKILL(n)
COMP 512, Rice University
16
Very Busy Expressions
Transformation: Hoisting
• e defined in every successor of b
• Evaluating e in b produces same result
• Saves code space, but shortens no path
Data-flow problem: Very Busy Expressions
VERYBUSY(b) = s succ(b) (UEEXPR(s) (VERYBUSY(s) - EXPRKILL(s)))
VERYBUSY(nf) = Ø
• VERYBUSY(b) contains expressions that are very busy at end of b
• UEEXPR(b) is the set of expressions used before they are killed in b
• EXPRKILL(b) is the set of expressions killed before they are used in b
VERYBUSY expressions is a backward flow problem
|VERYBUSY| = |expressions|
Form of f is same as in LIVE
*
COMP 512, Rice University
17
Constant Propagation (Classic formulation)
Transformation: Global Constant Folding
• Along every path to p, v has same known value
• Specialize computation at p based on v’s value
Data-flow problem: Constant Propagation
Domain is the set of pairs <vi,ci> where vi is a variable and ci C
CONSTANTS(b) = p preds(b) fp(CONSTANTS(p))
• performs a pairwise meet on two sets of pairs
• fp(x) is a block specific function that models the effects of block p on the <vi,ci> pairs in x
Constant propagation is a forward flow problem
|CONSTANTS| = |variables|
… c-2 c-1 c0 c1 c2 ...
The Lattice C
Form of f is quite different than in AVAIL
*
COMP 512, Rice University
18
Example
Meet operation requires more explanation
• c1 c2 = c1 if c1 = c2, else (bottom & top as expected)
What about fp ?
• If p has one statement then
x y with CONSTANTS(p) = {…<x,l1>,…<y,l2>…}
then fp(CONSTANTS(p)) = CONSTANTS(p) - <x,l1> + <x,l2>
X y op z with CONSTANTS(p) = {…<x,l1>,…<y,l2>… >,…<z,l3>…}
then fp(CONSTANTS(p)) = CONSTANTS(p) - <x,l1> + <x,l2 op l3>
• If p has n statements then
fp(CONSTANTS(p)) = fn(fn-1(fn-2(…f2(f1(CONSTANTS(p)))…)))
where fi is the function generated by the ith statement in p
fp interprets p over CONSTANTS
COMP 512, Rice University
19
Proliferation of Data-flow Problems
If we continue in this fashion,
• We will need a huge set of data-flow problems
• Each will have slightly different initial information
• This seems like a messy way to go …
Desiderata
• Solve one data-flow problem
• Use it for all transformations
To address this issue, researchers invented information chains
Topic of next lecture
COMP 512, Rice University
20
Some problems are not admissible
Global constant propagation
• First condition in admissibility f F, x,y L, f (x y) = f (x) f (y)
• Constant propagation is not admissible Kam & Ullman time bound does not hold There are tight time bounds, however, based on lattice
height Require a variable-by-variable formulation …
a b + c
• Function “f” models block’s effects
• f(S1) = {a=7,b=3,c=4}
• f(S2) = {a=7,b=1,c=6}
• f(S1S2) = Ø
S1: {b=3,c=4}
S2: {b=1,c=6}
F and do not distribute
Interprocedural May Modify Information
At a call site, what variables may be modified by its execution?
• Generalized answer for each procedure p : GMOD(P) GMOD(p) is independent of the calling context of p
• IMOD(p) is the set of variables modified locally in p
• Project an answer back to each call site s : MOD(s)
• Alias-free MOD(s) is DMOD(s), direct modification side effect
GMOD(p) = IMOD(p) ( s ∈ succ(p) DMOD(s))
DMOD(s) = Bs(GMOD(p)), where p is callee at s
• To compute final MOD sets, must factor in ALIAS information If <x,y> ∈ ALIAS(p) & X ∈ DMOD(s) for some s ∈ p, then
put both x & y in MOD(s)
COMP 512, Rice University
21
Flow-insensitive formulation by Banning
Be models the effects of call-by-reference binding.
With call-by-value, this problem is much simpler.
Some admissible problems are not rapid
Flow-insensitive Interprocedural May Modify sets
• Iterations proportional to number of parameters Not a function of the call graph Can make example arbitrarily bad
• Proportional to length of chain of bindings…
COMP 512, Rice University
22
shift(a,b,c,d,e,f){ local t; … call shift(t,a,b,c,d,e); f = 1; …}
• Assume call-by-reference (Call-by-value is simpler)
• Compute the set of variables (in shift) that can be modified by a call to shift
• How long does it take?
shift
a b c d e f Nothing to do with d(G)
Hint: build a sparse evaluation graph based on parameter binding [Cooper & Kennedy, PLDI 1988]