mit 6.035 introduction to program analysis and optimization martin rinard laboratory for computer...

87
MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Upload: darcy-walton

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

MIT 6.035Introduction to Program Analysis

and Optimization

Martin Rinard

Laboratory for Computer Science

Massachusetts Institute of Technology

Page 2: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Program AnalysisCompile-time reasoning about run-time behavior

of program– Can discover things that are always true:

• “x is always 1 in the statement y = x + z”

• “the pointer p always points into array a”

• “the statement return 5 can never execute”

– Can infer things that are likely to be true:• “the reference r usually refers to an object of class C”

• “the statement a = b + c appears to execute more frequently than the statement x = y + z”

– Distinction between data and control-flow properties

Page 3: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Transformations• Use analysis results to transform program• Overall goal: improve some aspect of program• Traditional goals:

– Reduce number of executed instructions– Reduce overall code size

• Other goals emerge as space becomes more complex– Reduce number of cycles

• Use vector or DSP instructions• Improve instruction or data cache hit rate

– Reduce power consumption– Reduce memory usage

Page 4: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Control Flow Graph

• Nodes Represent Computation– Each Node is a Basic Block– Basic Block is a Sequence of Instructions with

• No Branches Out Of Middle of Basic Block

• No Branches Into Middle of Basic Block

• Basic Blocks should be maximal

– Execution of basic block starts with first instruction– Includes all instructions in basic block

• Edges Represent Control Flow

Page 5: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Control Flow Graph

into add(n, k) {

s = 0; a = 4; i = 0;

if (k == 0) b = 1;

else b = 2;

while (i < n) {

s = s + a*b;

i = i + 1;

}

return s;

}

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

Page 6: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Basic Block Construction

s = 0;

a = 4;

• Start with instruction control-flow graph

• Visit all edges in graph

• Merge adjacent nodes if– Only one edge from first node

– Only one edge into second node

s = 0;

a = 4;

Page 7: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

Page 8: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

i = 0;

Page 9: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

i = 0;

k == 0

Page 10: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

i = 0;

k == 0

b = 2;

Page 11: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

i = 0;

k == 0

b = 2;

i < n

Page 12: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

i = 0;

k == 0

b = 2;

i < n

s = s + a*b;

Page 13: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;

return s;

s = 0;

a = 4;

i = 0;

k == 0

b = 2;

i < n

s = s + a*b;

i = i + 1;

Page 14: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

i = 0;

k == 0

b = 2;

i < n

s = s + a*b;

i = i + 1;

Page 15: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

i = 0;

k == 0

b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

Page 16: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

Page 17: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

Page 18: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

s = 0;

a = 4;

i = 0;

k == 0

b = 1;b = 2;

i < n

s = s + a*b;

i = i + 1;return s;

Page 19: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Program Points, Split and Join Points

• One program point before and after each statement in program

• Split point has multiple successors – conditional branch statements only split points

• Merge point has multiple predecessors• Each basic block

– Either starts with a merge point or its predecessor ends with a split point

– Either ends with a split point or its successor starts with a merge point

Page 20: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Two Kinds of Variables

• Temporaries Introduced By Compiler– Transfer values only within basic block– Introduced as part of instruction flattening– Introduced by optimizations/transformations– Typically assigned to only once

• Program Variables– Declared in original program– May be assigned to multiple times– May transfer values between basic blocks

Page 21: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Basic Block Optimizations

• Common Sub-Expression Elimination– a = (x+y)+z; b = x+y;

– t = x+y; a = t+z; b = t;

• Constant Propagation– x = 5; b = x+y;

– b = 5+y;

• Algebraic Identities– a = x * 1;

– a = x;

• Copy Propagation– a = x+y; b = a; c = b+z;

– a = x+y; b = a; c = a+z;

• Dead Code Elimination– a = x+y; b = a; c = a+z;

– a = x+y; c = a+z

• Strength Reduction– t = i * 4;

– t = i << 2;

Page 22: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Basic Block Analysis Approach• Assume normalized basic block - all statements are

of the form– var = var op var (where op is a binary operator)– var = op var (where op is a unary operator)– var = var

• Simulate a symbolic execution of basic block– Reason about values of variables (or other

aspects of computation)– Derive property of interest

Page 23: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Value Numbering• Reason about values of variables and expressions in

the program– Simulate execution of basic block– Assign virtual value to each variable and expression

• Discovered property: which variables and expressions have the same value

• Standard use: – Common subexpression elimination– Typically combined with transformation that

• Saves computed values in temporaries• Replaces expressions with temporaries when

value of expression previously computed

Page 24: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

b v5b v6

a = x+yb = a+zb = b+yc = a+z

a = x+yt1 = ab = a+zt2 = bb = b+yt3 = b

x v1y v2a v3z v4

c v5

Original BasicBlock

New BasicBlock

Var to Val

v1+v2 v3v3+v4 v5

Exp to Val

v1+v2 t1v3+v4 t2

Exp to Tmp

c = t2

v5+v2 v6 v5+v2 t3

Page 25: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Value Numbering Summary

• Forward symbolic execution of basic block• Each new value assigned to temporary

– a = x+y; becomes a = x+y; t = a;– Temporary preserves value for use later in program even if

original variable rewritten• a = x+y; a = a+z; b = x+y becomes• a = x+y; t = a; a = a+z; b = t;

• Maps– Var to Val – specifies symbolic value for each variable– Exp to Val – specifies value of each evaluated expression– Exp to Tmp – specifies tmp that holds value of each

evaluated expression

Page 26: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Map Usage• Var to Val

– Used to compute symbolic value of y and z when processing statement of form x = y + z

• Exp to Tmp– Used to determine which tmp to use if value(y) +

value(z) previously computed when processing statement of form x = y + z

• Exp to Val– Used to update Var to Val when

• processing statement of the form x = y + z, and• value(y) + value(z) previously computed

Page 27: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Interesting Properties

• Finds common subexpressions even if they use different variables in expressions– y = a+b; x = b; z = a+x becomes– y = a+b; t = y; x = b; z = t– Why? Because computes with symbolic values

• Finds common subexpressions even if variable that originally held the value was overwritten– y = a+b; x = b; y = 1; z = a+x becomes– y = a+b; t = y; x = b; y = 1; z = t– Why? Because saves values away in temporaries

Page 28: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

One More Interesting Property

• Flattening and CSE combine to capture partial and arbitrarily complex common subexpressions– w = (a+b)+c; x = b; y = (a+x)+c; z = a+b;– After flattening:– t1 = a+b; w = t1+c; x = b; t2 = a+x; y = t2 + c; z = a+b;– CSE algorithm notices that

• t1+c and t2+c compute same value

• In the statement z = a+b, a+b has already been computed so generated code can reuse the result

– t1=a+b; w = t1+c; t3 = w; x = b; t2=a+x; y = t3; z = t1;

Page 29: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Problems

• Algorithm has a temporary for each new value– a = x+y; t1 = a;

• Introduces– lots of temporaries– lots of copy statements to temporaries

• In many cases, temporaries and copy statements are unnecessary

• So we eliminate them with copy propagation and dead code elimination

Page 30: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Copy Propagation

• Once again, simulate execution of program• If can, use original variable instead of temporary

– a = x+y; b = x+y;– After CSE becomes a = x+y; t = a; b = t;– After CP becomes a = x+y; b = a;

• Key idea: – determine when original variable is NOT overwritten

between its assignment statement and the use of the computed value

– If not overwritten, use original variable

Page 31: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Copy Propagation Maps

• Maintain two maps – tmp to var: tells which variable to use instead of a

given temporary variable– var to set: inverse of tmp to var. tells which temps

are mapped to a given variable by tmp to var

Page 32: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Copy Propagation Example• Original

a = x+y

b = a+z

c = x+y

a = b

• After CSEa = x+y

t1 = a

b = a+z

t2 = b

c = t1

a = b

• After CSE and Copy Propagationa = x+y

t1 = a

b = a+z

t2 = b

c = a

a = b

Page 33: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Copy Propagation Example

a = x+yt1 = a

Basic BlockAfter CSE

a = x+yt1 = a

Basic Block After CSE and Copy Prop

tmp to var var to sett1 a a {t1}

Page 34: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Copy Propagation Example

a = x+yt1 = ab = a+zt2 = b

Basic BlockAfter CSE

a = x+yt1 = ab = a+zt2 = b

Basic Block After CSE and Copy Prop

tmp to var var to sett1 at2 b

a {t1}b {t2}

Page 35: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Copy Propagation Example

a = x+yt1 = ab = a+zt2 = bc = t1

Basic BlockAfter CSE

a = x+yt1 = ab = a+zt2 = b

Basic Block After CSE and Copy Prop

tmp to var var to sett1 at2 b

a {t1}b {t2}

Page 36: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Copy Propagation Example

a = x+yt1 = ab = a+zt2 = bc = t1

Basic BlockAfter CSE

a = x+yt1 = ab = a+zt2 = bc = a

Basic Block After CSE and Copy Prop

tmp to var var to sett1 at2 b

a {t1}b {t2}

Page 37: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Copy Propagation Example

a = x+yt1 = ab = a+zt2 = bc = t1a = b

Basic BlockAfter CSE

a = x+yt1 = ab = a+zt2 = bc = aa = b

Basic Block After CSE and Copy Prop

tmp to var var to sett1 at2 b

a {t1}b {t2}

Page 38: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Copy Propagation Example

a = x+yt1 = ab = a+zt2 = bc = t1a = b

Basic BlockAfter CSE

a = x+yt1 = ab = a+zt2 = bc = aa = b

Basic Block After CSE and Copy Prop

tmp to var var to sett1 t1t2 b

a {}b {t2}

Page 39: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Dead Code Elimination

• Copy propagation keeps all temps around

• May be temps that are never read

• Dead Code Elimination removes them

a = x+yt1 = ab = a+zt2 = bc = aa = b

a = x+yb = a+zc = aa = b

Basic Block After CSE and Copy Prop

Basic Block After CSE and Copy Prop

Page 40: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Dead Code Elimination

• Basic Idea– Process Code In Reverse Execution Order– Maintain a set of variables that are needed later in

computation– If encounter an assignment to a temporary that is

not needed, remove assignment

Page 41: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+yt1 = ab = a+zt2 = bc = aa = b

Basic Block After CSE and Copy Prop

Needed Set{b}

Page 42: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+yt1 = ab = a+zt2 = bc = aa = b

Basic Block After CSE and Copy Prop

Needed Set{a, b}

Page 43: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+yt1 = ab = a+zt2 = bc = aa = b

Basic Block After CSE and Copy Prop

Needed Set{a, b}

Page 44: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+yt1 = ab = a+z

c = aa = b

Basic Block After CSE and Copy Prop

Needed Set{a, b}

Page 45: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+yt1 = ab = a+z

c = aa = b

Basic Block After CSE and Copy Prop

Needed Set{a, b, z}

Page 46: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+yt1 = ab = a+z

c = aa = b

Basic Block After CSE and Copy Prop

Needed Set{a, b, z}

Page 47: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+y

b = a+z

c = aa = b

Basic Block After CSE and Copy Prop

Needed Set{a, b, z}

Page 48: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+y

b = a+z

c = aa = b

Basic Block After , CSE Copy Propagation,and Dead Code Elimination

Needed Set{a, b, z}

Page 49: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+y

b = a+z

c = aa = b

Basic Block After , CSE Copy Propagation,and Dead Code Elimination

Needed Set{a, b, z}

Page 50: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Interesting Properties

• Analysis and Transformation Algorithms Symbolically Simulate Execution of Program– CSE and Copy Propagation go forward– Dead Code Elimination goes backwards

• Transformations stacked– Group of basic transformations work together– Often, one transformation creates inefficient code

that is cleaned up by following transformations– Transformations can be useful even if original code

may not benefit from transformation

Page 51: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Other Basic Block Transformations

• Constant Propagation

• Strength Reduction– a << 2 = a * 4; a + a + a = 3 * a;

• Algebraic Simplification– a = a * 1; b = b + 0;

• Do these in unified transformation framework, not in earlier or later phases

Page 52: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Summary

• Basic block analyses and transformations• Symbolically simulate execution of program

– Forward (CSE, copy prop, constant prop)– Backward (Dead code elimination)

• Stacked groups of analyses and transformations that work together– CSE introduces excess temporaries and copy statements– Copy propagation often eliminates need to keep temporary

variables around– Dead code elimination removes useless code

• Similar in spirit to many analyses and transformations that operate across basic blocks

Page 53: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

MIT 6.035Introduction to Dataflow Analysis

Martin Rinard

Laboratory for Computer Science

Massachusetts Institute of Technology

Page 54: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Dataflow Analysis

• Used to determine properties of program that involve multiple basic blocks

• Typically used to enable transformations– common sub-expression elimination – constant and copy propagation– dead code elimination

• Analysis and transformation often come in pairs

Page 55: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Reaching Definitions

• Concept of definition and use– a = x+y– is a definition of a– is a use of x and y

• A definition reaches a use if – value written by definition– may be read by use

Page 56: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Reaching Definitions

s = 0; a = 4; i = 0;

k == 0

b = 1; b = 2;

i < n

s = s + a*b;i = i + 1; return s

Page 57: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Reaching Definitions and Constant Propagation

• Is a use of a variable a constant?– Check all reaching definitions– If all assign variable to same constant– Then use is in fact a constant

• Can replace variable with constant

Page 58: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Is a Constant in s = s+a*b?

s = 0; a = 4; i = 0;

k == 0

b = 1; b = 2;

i < n

s = s + a*b;i = i + 1; return s

Yes!On all reaching

definitionsa = 4

Page 59: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Constant Propagation Transform

s = 0; a = 4; i = 0;

k == 0

b = 1; b = 2;

i < n

s = s + 4*b;i = i + 1; return s

Yes!On all reaching

definitionsa = 4

Page 60: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Is b Constant in s = s+a*b?

s = 0; a = 4; i = 0;

k == 0

b = 1; b = 2;

i < n

s = s + a*b;i = i + 1; return s

No!One reaching definition with

b = 1 One reaching definition with

b = 2

Page 61: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Computing Reaching Definitions

• Compute with sets of definitions– represent sets using bit vectors– each definition has a position in bit vector

• At each basic block, compute– definitions that reach start of block– definitions that reach end of block

• Do computation by simulating execution of program until reach fixed point

Page 62: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

1: s = 0; 2: a = 4; 3: i = 0;k == 0

4: b = 1; 5: b = 2;

i < n

6: s = s + a*b;7: i = i + 1; return s

0000000

11100001110000

11111001111100

1111100

11111111111111

1111111

Page 63: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Formalizing Analysis

• Each basic block has– IN - set of definitions that reach beginning of block– OUT - set of definitions that reach end of block– GEN - set of definitions generated in block– KILL - set of definitions killed in in block

• GEN[s = s + a*b; i = i + 1;] = 0000011

• KILL[s = s + a*b; i = i + 1;] = 1010000

• Compiler scans each basic block to derive GEN and KILL sets

Page 64: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Dataflow Equations

• IN[b] = OUT[b1] U ... U OUT[bn]– where b1, ..., bn are predecessors of b in CFG

• OUT[b] = (IN[b] - KILL[b]) U GEN[b]

• IN[entry] = 0000000

• Result: system of equations

Page 65: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Solving Equations• Use fixed point algorithm

• Initialize with solution of OUT[b] = 0000000

• Repeatedly apply equations– IN[b] = OUT[b1] U ... U OUT[bn]– OUT[b] = (IN[b] - KILL[b]) U GEN[b]

• Until reach fixed point

• Until equation application has no further effect

• Use a worklist to track which equation applications may have a further effect

Page 66: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Reaching Definitions Algorithm for all nodes n in N OUT[n] = emptyset; // OUT[n] = GEN[n];

Changed = D; // D = all definitions in graph

while (Changed != emptyset)

choose a node n in Changed;

Changed = Changed - { n };

IN[n] = emptyset;

for all nodes p in predecessors(n) IN[n] = IN[n] U OUT[p];

OUT[n] = GEN[n] U (IN[n] - KILL[n]);

if (OUT[n] changed)

for all nodes s in successors(n) Changed = Changed U { s };

Page 67: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Questions

• Does the algorithm halt?– yes, because transfer function is monotonic– if increase IN, increase OUT– in limit, all bits are 1

• If bit is 1, is there always an execution in which corresponding definition reaches basic block?

• If bit is 0, does the corresponding definition ever reach basic block?

• Concept of conservative analysis

Page 68: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Available Expressions

• An expression x+y is available at a point p if – every path from the initial node to p evaluates x+y

before reaching p, – and there are no assignments to x or y after the

evaluation but before p.

• Available Expression information can be used to do global (across basic blocks) CSE

• If expression is available at use, no need to reevaluate it

Page 69: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Computing Available Expressions

• Represent sets of expressions using bit vectors

• Each expression corresponds to a bit

• Run dataflow algorithm similar to reaching definitions

• Big difference– definition reaches a basic block if it comes from

ANY predecessor in CFG– expression is available at a basic block only if it is

available from ALL predecessors in CFG

Page 70: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+y;x == 0

x = z;b = x+y;

i < n

c = x+y;i = i+c;

d = x+y

i = x+y;

Expressions1: x+y2: i<n3: i+c4: x==0

0000

1001

1000

1000

1100 1100

Page 71: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

a = x+y;t = a;

x == 0

x = z;b = x+y;

t = b;

i < n

c = t;i = i+c;

d = t;

i = t;

Expressions1: x+y2: i<n3: i+c4: x==0

0000

1001

1000

1000

1100 1100

Global CSE Transform

must use same tempfor CSE in all blocks

Page 72: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Formalizing Analysis• Each basic block has

– IN - set of expressions available at start of block– OUT - set of expressions available at end of block– GEN - set of expressions computed in block– KILL - set of expressions killed in in block

• GEN[x = z; b = x+y] = 1000

• KILL[x = z; b = x+y] = 1001

• Compiler scans each basic block to derive GEN and KILL sets

Page 73: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Dataflow Equations

• IN[b] = OUT[b1] intersect ... intersect OUT[bn]– where b1, ..., bn are predecessors of b in CFG

• OUT[b] = (IN[b] - KILL[b]) U GEN[b]

• IN[entry] = 0000

• Result: system of equations

Page 74: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Solving Equations• Use fixed point algorithm

• IN[entry] = 0000

• Initialize OUT[b] = 1111

• Repeatedly apply equations– IN[b] = OUT[b1] intersect ... intersect OUT[bn]– OUT[b] = (IN[b] - KILL[b]) U GEN[b]

• Use a worklist algorithm to reach fixed point

Page 75: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Available Expressions Algorithmfor all nodes n in N OUT[n] = E; // OUT[n] = E - KILL[n];

IN[Entry] = emptyset; OUT[Entry] = GEN[Entry];

Changed = N - { Entry }; // N = all nodes in graph

while (Changed != emptyset)

choose a node n in Changed;

Changed = Changed - { n };

IN[n] = E; // E is set of all expressions

for all nodes p in predecessors(n)

IN[n] = IN[n] intersect OUT[p];

OUT[n] = GEN[n] U (IN[n] - KILL[n]);

if (OUT[n] changed)

for all nodes s in successors(n) Changed = Changed U { s };

Page 76: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Questions

• Does algorithm always halt?

• If expression is available in some execution, is it always marked as available in analysis?

• If expression is not available in some execution, can it be marked as available in analysis?

• In what sense is algorithm conservative?

Page 77: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Duality In Two Algorithms

• Reaching definitions– Confluence operation is set union– OUT[b] initialized to empty set

• Available expressions– Confluence operation is set intersection– OUT[b] initialized to set of available expressions

• General framework for dataflow algorithms.

• Build parameterized dataflow analyzer once, use for all dataflow problems

Page 78: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Liveness Analysis

• A variable v is live at point p if – v is used along some path starting at p, and – no definition of v along the path before the use.

• When is a variable v dead at point p?– No use of v on any path from p to exit node, or– If all paths from p redefine v before using v.

Page 79: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

What Use is Liveness Information?• Register allocation.

– If a variable is dead, can reassign its register

• Dead code elimination.– Eliminate assignments to variables not read later.– But must not eliminate last assignment to variable

(such as instance variable) visible outside CFG.– Can eliminate other dead assignments.– Handle by making all externally visible variables

live on exit from CFG

Page 80: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Conceptual Idea of Analysis

• Simulate execution

• But start from exit and go backwards in CFG

• Compute liveness information from end to beginning of basic blocks

Page 81: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Liveness Example

a = x+y;t = a;

c = a+x;x == 0

b = t+z;

c = y+1; 1100100

1110000

• Assume a,b,c visible outside method

• So are live on exit

• Assume x,y,z,t not visible

• Represent Liveness Using Bit Vector– order is abcxyzt

1100111

Page 82: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Dead Code Elimination

a = x+y;t = a;

x == 0

b = t+z;

c = y+1; 1100100

1110000

• Assume a,b,c visible outside method

• So are live on exit

• Assume x,y,z,t not visible

• Represent Liveness Using Bit Vector– order is abcxyzt

1100111

Page 83: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Formalizing Analysis• Each basic block has

– IN - set of variables live at start of block– OUT - set of variables live at end of block– USE - set of variables with upwards exposed uses in

block– DEF - set of variables defined in block

• USE[x = z; x = x+1;] = { z } (x not in USE)

• DEF[x = z; x = x+1;y = 1;] = {x, y}

• Compiler scans each basic block to derive GEN and KILL sets

Page 84: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Algorithm out[Exit] = emptyset; in[Exit] = use[n];

for all nodes n in N - { Exit } in[n] = emptyset;

Changed = N - { Exit };

while (Changed != emptyset)

choose a node n in Changed;

Changed = Changed - { n };

out[n] = emptyset;

for all nodes s in successors(n) out[n] = out[n] U in[p];

in[n] = use[n] U (out[n] - def[n]);

if (in[n] changed)

for all nodes p in predecessors(n)

Changed = Changed U { p };

Page 85: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Similar to Other Dataflow Algorithms

• Backwards analysis, not forwards

• Still have transfer functions

• Still have confluence operators

• Can generalize framework to work for both forwards and backwards analyses

Page 86: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Analysis Information Inside Basic Blocks

• One detail:– Given dataflow information at IN and OUT of node– Also need to compute information at each statement

of basic block– Simple propagation algorithm usually works fine– Can be viewed as restricted case of dataflow

analysis

Page 87: MIT 6.035 Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Summary• Basic Blocks and Basic Block Optimizations

– Copy and constant propagation– Common sub-expression elimination– Dead code elimination

• Dataflow Analysis– Control flow graph– IN[b], OUT[b], transfer functions, join points

• Paired analyses and transformations– Reaching definitions/constant propagation– Available expressions/common sub-expression elimination– Liveness analysis/Dead code elimination

• Stacked analysis and transformations work together