proliferation of data-flow problems copyright 2011, keith d. cooper & linda torczon, all rights...

Proliferation of Data-flow Problems

Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.

Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use.

Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved.

Comp 512Spring 2011

Data-flow Basics: Review from Last Lecture

• Graph, sets, equations, & initial values define a problem Sets form a semilattice Equations treated as transfer functions Properties of semilattice & transfer functions determine

termination, and correctness

• Dominator calculation One of the simplest forward data-flow problems

• Liveness information A typical (backwards) global data-flow problem

• Both DOM and LIVE are well behaved Admissible frameworks with unique fixed points

• Postorder, rPostorder, Preorder, & rPreorder

COMP 512, Rice University

2

Today’s Lecture

• Framework properties and speed of convergence Kam-Ullman rapid condition Worklist iterative algorithm

• More data-flow problems Available expressions Very busy expressions Constant propagation (Kildall style, not

SCCP)

• Problems that don’t fit the Kam-Ullman model for speed


3


4

Iterative Data-flow Analysis

Correctness

• If every fn F is monotone, i.e., f(xy) ≤ f(x) f(y), and

• If the lattice is bounded, i.e., every descending chain is finite Chain is sequence x1, x2, …, xn where xi L, 1 ≤ i ≤ n

xi > xi+1, 1 ≤ i < n chain is descending

Then

• The round-robin algorithm computes a least fixed-point (LFP)

• The uniqueness of the solution depends on other properties of F

• Unique solution it finds the one we want

• Multiple solutions we need to know which one it finds

Reference material


5


Correctness

• Does the iterative algorithm compute the desired answer?

Admissible Function Spaces

1. f F, x,y L, f (x y) = f (x) f (y)

2. fi F such that x L, fi(x) = x

3. f,g F h F such that h(x ) = f (g(x))

4. x L, a finite subset H F such that x = f H f ()

If F meets these four conditions, then an instance of the problem will have a unique fixed point solution (instance graph + initial values)

LFP = MFP = MOP order of evaluation does not matter

*

Both DOM & LIVE meet all four criteria

If meet does not distribute over function application, then the fixed point solution may not be unique. The iterative algorithm will find a LFP.

Reference material


6


Speed

• For a problem with an admissible function space & a bounded semilattice,

• If the functions all meet the rapid condition, i.e.,

f,g F, x L, f (g()) ≥ g() f (x) x

then, a round-robin, reverse-postorder iterative algorithm will halt in d(G)+3 passes over a graph G

d(G) is the loop-connectedness of the graph w.r.t a DFST Maximal number of back edges in an acyclic path Several studies suggest that, in practice, d(G) is small

(<3) For most CFGs, d(G) is independent of the specific DFST

Sets stabilize in two passes around a

loop

Each pass does O(E ) meets & O(N ) other operations

*

Both DOM & LIVE are rapid frameworks


7

Iterative Data-flow analysis

What does this mean?

• Reverse postorder Easily computed order that increases propagation per pass

• Round-robin iterative algorithm Visit all the nodes in a consistent order (RPO) Do it again until the sets stop changing

• Rapid condition Most classic global data-flow problems meet this condition

These conditions are easily met Admissible framework, rapid function space Round-robin, reverse-postorder, iterative algorithm

The analysis runs in (effectively) linear time


8

The Round-robin Iterative Algorithm

LIVEOUT(nf ) Ø

for all other nodes nLIVEIN(n) UEVAR(n)

change true

while (change)

change false for i 0 to N

n rPreorder(i)TEMP ssucc(n) LIVEIN(s)

if LIVEOUT(n) ≠ TEMP then change true LIVEOUT(n ) TEMP LIVEIN(n) UEVAR(x) (LIVEOUT(x) VARKILL(x))

Meet operation

We could, for brevity, fold the transfer function into the computation of LIVEOUTand eliminate the need to represent LIVEIN.

We use this formulation for clarity.

Transfer function


9

The Worklist Iterative Algorithm

Worklist { all nodes, ni }

LIVEOUT(nf) Ø

for all other nodes nLIVEIN(n) UEVAR(n)

while (Worklist Ø)

remove a node n from Worklist

TEMP ssucc(n) LIVEIN(s)

if LIVEOUT(n) ≠ TEMP then

LIVEOUT(ni ) TEMP

LIVEIN(ni) UEVAR(x) (LIVEOUT(x) VARKILL(x))

Worklist Worklist preds(n)

This algorithm avoids re-evaluating LIVEOUT when none of its constituents has changed. Thus, it does fewer set operations on LIVEOUT & LIVEIN than the round-robin algorithm. It does manipulate the Worklist, which is an added cost.

Implement the Worklist as a set – no duplicate entries


10

Round-robin Versus Worklist Algorithm

Termination, correctness, & complexity arguments are for the round-robin version

• The worklist algorithm does less work; it visits fewer nodes

• Consider a version with two worklists: current & next Draw nodes from current in RPO; add them to next Swap the worklists when current becomes empty (1 r-r pass) Makes same changes, in same order, as round robin Termination & correctness arguments should carry over

• Is this the best order? Maybe not … Iterate around a loop until it stabilizes Many data structures for worklist, many orders Your mileage will vary (measurable

effect )

Worklist algorithm is a sparse version of round-robin algorithm


11


How do we use these results?

Prove that data-flow framework is admissible & rapid Its just algebra Most (but not all) global data-flow problems are rapid This is a property of F

Code up an iterative algorithm World’s simplest data-flow algorithm

• Rely on the results Theoretically sound, robust algorithm

This lets us ignore most of the other data-flow algorithms in 512


12

Live Variables

Transformation: Eliminating unneeded stores

• e in a register, have seen last definition, never again used

• The store is dead (except for debugging)

• Compiler can eliminate the store

Data-flow problem: Live variables

LIVEOUT(b) = s succ(b) (UEVAR(s) (LIVE(s) VARKILL(s)))

LIVEOUT(nf) = Ø

• LIVE(b) is the set of variables live on exit from b

• VARKILL(b) is the set of variables that are redefined in b

• UEVAR(b) is the set of variables used before any redefinition in b

Live analysis is a backward flow problem

|LIVEOUT| = |variables|

LIVE plays an important role in both register allocation and the pruned-SSA construction.

Form of f is same as in AVAIL

*


13

Computing Available Expressions

An expression is available on entry to b if, along every path from the entry node to b, it has been computed & none of its constituent subexpressions has been subsequently modified.

AVAIL(b) = xpred(b) (DEEXPR(x) (AVAIL(x) EXPRKILL(x)))

where

• EXPRKILL(b) is the set of expression killed in b, and

• DEEXPR(b) is the set of expressions defined in b and not subsequently killed in b

Initial conditionsAVAIL(n0) = Ø, because nothing is computed before n0

AVAIL(n) = { all expressions }, because ∧is ∩

Available expressions is a forward data-flow problem

|AVAIL| = |expressions|

Using Available Expressions

AVAIL is the basis for global common subexpression elimination

• If x+y ∈ AVAIL(b) then the compiler can replace any upwards exposed occurrence of x+y in b with a reference to the previously computed value

• Standard caveats occur about preserving prior computations Hash x+y and assign temporary names; copy all x+y to

name Walk backward through code to find minimal set of copies Other clever schemes abount

• Resulting algorithm is similar to value numbering Based on textual identity of expressions, not value identity Recognizes redundancy carried along loop-closing branch

• The classic form of GCSE (Cocke, SIGPLAN Conference, 1970)


14


15

Very Busy Expressions

An expression e is very busy on exit from n, iff e is evaluated & used

along every path from n to nf AND evaluating e at the end of n would

produce the same result as the next evaluation along those paths

The Analysis

• Annotate each block n with a set VERYBUSY(n) that contains each expression

• Perform analysis to find opportunities

• Problem is a backward problem

Initialization(s)

VERYBUSY(nf ) = Ø

VERYBUSY(n) = ?

Recurrence Equation

VERYBUSY(n) = ?

…x+y…

…x+y…

…

…

Hint: Use UEEXPR(n) and EXPRKILL(n)


16

Very Busy Expressions

Transformation: Hoisting

• e defined in every successor of b

• Evaluating e in b produces same result

• Saves code space, but shortens no path

Data-flow problem: Very Busy Expressions

VERYBUSY(b) = s succ(b) (UEEXPR(s) (VERYBUSY(s) - EXPRKILL(s)))

VERYBUSY(nf) = Ø

• VERYBUSY(b) contains expressions that are very busy at end of b

• UEEXPR(b) is the set of expressions used before they are killed in b

• EXPRKILL(b) is the set of expressions killed before they are used in b

VERYBUSY expressions is a backward flow problem

|VERYBUSY| = |expressions|

Form of f is same as in LIVE

*


17

Constant Propagation (Classic formulation)

Transformation: Global Constant Folding

• Along every path to p, v has same known value

• Specialize computation at p based on v’s value

Data-flow problem: Constant Propagation

Domain is the set of pairs <vi,ci> where vi is a variable and ci C

CONSTANTS(b) = p preds(b) fp(CONSTANTS(p))

• performs a pairwise meet on two sets of pairs

• fp(x) is a block specific function that models the effects of block p on the <vi,ci> pairs in x

Constant propagation is a forward flow problem

|CONSTANTS| = |variables|

… c-2 c-1 c0 c1 c2 ...

The Lattice C

Form of f is quite different than in AVAIL

*


18

Example

Meet operation requires more explanation

• c1 c2 = c1 if c1 = c2, else (bottom & top as expected)

What about fp ?

• If p has one statement then

x y with CONSTANTS(p) = {…<x,l1>,…<y,l2>…}

then fp(CONSTANTS(p)) = CONSTANTS(p) - <x,l1> + <x,l2>

X y op z with CONSTANTS(p) = {…<x,l1>,…<y,l2>… >,…<z,l3>…}

then fp(CONSTANTS(p)) = CONSTANTS(p) - <x,l1> + <x,l2 op l3>

• If p has n statements then

fp(CONSTANTS(p)) = fn(fn-1(fn-2(…f2(f1(CONSTANTS(p)))…)))

where fi is the function generated by the ith statement in p

fp interprets p over CONSTANTS


19

Proliferation of Data-flow Problems

If we continue in this fashion,

• We will need a huge set of data-flow problems

• Each will have slightly different initial information

• This seems like a messy way to go …

Desiderata

• Solve one data-flow problem

• Use it for all transformations

To address this issue, researchers invented information chains

Topic of next lecture


20

Some problems are not admissible

Global constant propagation

• First condition in admissibility f F, x,y L, f (x y) = f (x) f (y)

• Constant propagation is not admissible Kam & Ullman time bound does not hold There are tight time bounds, however, based on lattice

height Require a variable-by-variable formulation …

a b + c

• Function “f” models block’s effects

• f(S1) = {a=7,b=3,c=4}

• f(S2) = {a=7,b=1,c=6}

• f(S1S2) = Ø

S1: {b=3,c=4}

S2: {b=1,c=6}

F and do not distribute

Interprocedural May Modify Information

At a call site, what variables may be modified by its execution?

• Generalized answer for each procedure p : GMOD(P) GMOD(p) is independent of the calling context of p

• IMOD(p) is the set of variables modified locally in p

• Project an answer back to each call site s : MOD(s)

• Alias-free MOD(s) is DMOD(s), direct modification side effect

GMOD(p) = IMOD(p) ( s ∈ succ(p) DMOD(s))

DMOD(s) = Bs(GMOD(p)), where p is callee at s

• To compute final MOD sets, must factor in ALIAS information If <x,y> ∈ ALIAS(p) & X ∈ DMOD(s) for some s ∈ p, then

put both x & y in MOD(s)


21

Flow-insensitive formulation by Banning

Be models the effects of call-by-reference binding.

With call-by-value, this problem is much simpler.

Some admissible problems are not rapid

Flow-insensitive Interprocedural May Modify sets

• Iterations proportional to number of parameters Not a function of the call graph Can make example arbitrarily bad

• Proportional to length of chain of bindings…


22

shift(a,b,c,d,e,f){ local t; … call shift(t,a,b,c,d,e); f = 1; …}

• Assume call-by-reference (Call-by-value is simpler)

• Compute the set of variables (in shift) that can be modified by a call to shift

• How long does it take?

shift

a b c d e f Nothing to do with d(G)

Hint: build a sparse evaluation graph based on parameter binding [Cooper & Kennedy, PLDI 1988]

proliferation of data-flow problems copyright 2011, keith d. cooper & linda torczon, all rights...

Documents