machine-independent optimizations Ⅰ cs308 compiler theory1
TRANSCRIPT
Machine-Independent Optimizations Ⅰ
CS308 Compiler Theory 1
Code optimization
• Elimination of unnecessary instructions
• Replacement of one sequence of instructions by a faster sequence of instructions
• Local optimization
• Global optimizations– based on data flow analyses
CS308 Compiler Theory 2
The Principal Sources of Optimization
• Optimization– Preserves the semantics of the original program
– Applies relatively low-level semantic transformations
CS308 Compiler Theory 3
Causes of Redundancy
• Redundant operations are– at the source level
– a side effect of having written the program in a high-level language
• Each of high-level data-structure accesses expands into a number of low-level arithmetic operations
• Programmers are not aware of these low-level operations and cannot eliminate the redundancies themselves.
• By having a compiler eliminate the redundancies– The programs are both efficient and easy to maintain.
CS308 Compiler Theory 4
A Running Example: Quicksort
CS308 Compiler Theory 5
CS308 Compiler Theory 6
Semantics-Preserving Transformations
• A number of ways in which a compiler can improve a program without changing the function it computes– Common-sub expression elimination
– Copy propagation
– Dead-code elimination
– Constant folding
CS308 Compiler Theory 7
Common Subexpressions
• Common subexpression– Previously computed
– The values of the variables not changed
• Local:
CS308 Compiler Theory 8
Common Subexpressions
• Global
CS308 Compiler Theory 9
CS308 Compiler Theory 10
Copy Propagation
• Copy statements or Copies– u = v
CS308 Compiler Theory 11
Dead-Code Elimination
• Live variable– A variable is live at a point in a program if its value can be used subsequently;
– otherwise, it is dead at that point.
• Constant folding– Deducing at compile time that the value of an expression is a constant and using the
constant instead
CS308 Compiler Theory 12
CS308 Compiler Theory 13
Code Motion
• An important modification that decreases the amount of code in a loop
• Loop-invariant computation– An expression that yields the same result independent of the number of times a loop is
executed
• Code Motion takes loop-invariant computation before its loop
CS308 Compiler Theory 14
while (i <= limit-2)
t = limit -2while (i <= t)
Induction Variables and Reduction in Strength
• Induction variable– For an induction variable x, there is a positive or negative constant c such that each time x is
assigned, its value increases by c
• Induction variables can be computed with a single increment (addition or subtraction) per loop iteration
• Strength reduction– The transformation of replacing an expensive operation, such as multiplication, by a
cheaper one, such as addition
• Induction variables lead to – strength reduction
– eliminate computation
CS308 Compiler Theory 15
Now We have:
CS308 Compiler Theory 16
Inside-out
CS308 Compiler Theory 17
Test yourself
• E-9.1.1
CS308 Compiler Theory 18
Data-Flow Analysis
• Techniques that derive information about the flow of data along program execution paths
• Examples– One way to implement global common sub expression elimination requires us to determine
whether two identical expressions evaluate to the same value along any possible execution path of the program.
– If the result of an assignment is not used along any subsequent execution path, then we can eliminate the assignment as dead code.
CS308 Compiler Theory 19
The Data-Flow Abstraction
• Execution paths– Within one basic block, the program point after a statement is the same as the program
point before the next statement.
– If there is an edge from block B1 to block B2 , then the program point after the last statement of B1 may be followed immediately by the program point before the first statement of B2.
• Define an execution path from point P1 to point Pn to be a sequence of points P1 , P2 , . . . , Pn such that for each i = 1 , 2, . . . , n - 1, either1 . Pi is the point immediately preceding a statement and Pi+1 is the point immediately
following that same statement, or
2. Pi is the end of some block and Pi+1 is the beginning of a successor block.
• Reaching definition– The definitions that may reach a program point along some path
CS308 Compiler Theory 20
The Data-Flow Analysis Schema
• data-flow value– represents an abstraction of the set of all possible program states that can be observed for a
program point
• Domain– The set of possible data-flow values for the application.
– Example: the domain of data-flow values for reaching definitions is the set of all subsets of definitions in the program.
• Denote the data-flow values before and after each statement s by IN[S] and OUT[S]
• Data-flow problem– to find a solution to a set of constraints on the IN [S] 'S and OUT[S] 'S, for all statements S.
– Two sets of constraints: those based on the semantics of the statements ("transfer functions" ) and those based on the flow of control.
CS308 Compiler Theory 21
Transfer Functions
• The data-flow values before and after a statement are constrained by the semantics of the statement.
• transfer function– Both a and b will have the same value after the b=a statement.
– Transfer function of a statement s is denoted as fs
• Two flavors of transfer function– Information propagate forward along execution paths
– Flow backwards up the execution paths
CS308 Compiler Theory 22
Control-Flow Constraints
• Simple for within a basic block– if a block B consists of statements Sl , S2 , . . . , Sn in that order, then the control-flow
value out of Si is the same as the control-flow value into Si+1.
• Complicated for between basic blocks
CS308 Compiler Theory 23
Data-Flow Schemas on Basic Blocks
• IN[B] , OUT[B]– denote the data-flow values immediately before and immediately after basic block B
• IN[B] = IN[S1], OUT[B] = OUT[Sn]– Suppose block B consists of statements Sl , . . . , Sn , in that order.
• fB = fSn ○ • • • ○ fS2 ○ fS1
CS308 Compiler Theory 24
Reaching Definitions
• A definition d reaches a point p if there is a path from the point immediately following d to p, such that d is not "killed" along that path.
• A definition of a variable x is killed if there is any other definition of x anywhere along the path.
• Conservative– if we do not know whether a statement s is assigning a value to x, we must assume that it
may assign to it.
CS308 Compiler Theory 25
Transfer Equations for Reaching Definitions
• Generates a definition d of variable u and
• Kills all other definitions in the program that define variable u
• Transfer function of definition d can be expressed as
• where gend = {d} , the set of definitions generated by the statement, and killd is the set of all other definitions of u in the program.
CS308 Compiler Theory 26
Transfer Equations for Reaching Definitions
• If
• Then
CS308 Compiler Theory 27
Transfer Equations for Reaching Definitions
• Suppose block B has n statements, with transfer functions
for Then
CS308 Compiler Theory 28
Transfer Equations for Reaching Definitions
• The gen set contains all the definitions inside the block that are "visible" immediately after the block
• Downwards exposed– A definition is downwards exposed in a basic block only if it is not "killed" by a subsequent
definition to the same variable inside the same basic block.
– A basic block's kill set is simply the union of all the definitions killed by the individual statements.
– kill=kill1 U kill2={d1,d2}
– gen=gen2 U (gen1-kill2) = {d2}
– f(x)={d2} U (x-{d1,d2}) //always includes d2
CS308 Compiler Theory 29
Control-Flow Equations
• OUT[P] IN[B] whenever there is a control-flow edge from P to B.
• IN[B] needs to be no larger than the union of the reaching definitions of all the predecessor blocks
CS308 Compiler Theory 30
Iterative Algorithm for Reaching Definitions
• The reaching definitions problem is defined by the following equations:
• for all basic blocks B other than ENTRY
CS308 Compiler Theory 31
Iterative Algorithm for Reaching Definitions
Algorithm : Reaching definitions.
INPUT: A flow graph for which killB and genB have been computed for each block B.
OUTPUT: IN[B ] and OUT[B]
METHOD:
CS308 Compiler Theory 32
CS308 Compiler Theory 33
Live-Variable Analysis
• In live-variable analysis we wish to know for variable x and point p whether the value of x at p could be used along some path in the flow graph starting at p. If so, we say x is live at p; otherwise, x is dead at p.
• Definitions:
1. defB: the set of variables defined in B prior to any use of that variable in B
2. useB: the set of variables whose values may be used in B prior to any
definition of the variable.
CS308 Compiler Theory 34
Live-Variable Analysis
• Equations relating def and use:
for all basic blocks B other than EXIT
CS308 Compiler Theory 35
Live-Variable Analysis
• Algorithm: Live-variable analysis.
• INPUT: A flow graph with def and use computed for each block.
• OUTPUT: IN[B] and OUT[B].
• METHOD:
CS308 Compiler Theory 36
Available Expressions
• An expression x + y is available at a point p:– if every path, from the entry node to p evaluates x + y, and after the last such evaluation
prior to reaching p, there are no subsequent assignments to x or y.
• A block kills expression x + y :– if it assigns (or may assign) x or y and does not subsequently recompute x + y.
• A block generates expression x + y :– if it definitely evaluates x + y and does not subsequently define x or y.
CS308 Compiler Theory 37
Available Expressions
• The primary use of available-expression information is for detecting global common subexpressions.
CS308 Compiler Theory 38
Available Expressions
• Computation of the set of generated expressions– At point p set S of expressions is available, and q is the point after p, with statement x=y+z
1. Add to S the expression y + z.
2. Delete from S any expression involving variable x .
Example:
CS308 Compiler Theory 39
Available Expressions
• Let
IN[B] be the set of expressions that are available before B
OUT[B] be the same for the point following the end of B
e_genB be the expressions generated by B
e_killB be the set of expressions killed in B
• Then
• For all basic blocks B other than ENTRY
CS308 Compiler Theory 40
Available Expressions
• Algorithm: Available expressions.
• INPUT: A flow graph with e_killB and e_genB computed for each block B. The initial block is B1 .
• OUTPUT: IN[B] and OUT[B].
• METHOD:
CS308 Compiler Theory 41
Test yourself
• gen, kill, IN, OUT sets
for each block
• e_gen, e_kill, IN, OUT sets
for available expressions.
• def, use, IN, OUT sets
for live variable analysis.
CS308 Compiler Theory 42