data-flow analysis. approaches static analysis inspections dependence analysis symbolic execution...

38
Data-Flow Analysis

Upload: leona-jenkins

Post on 28-Dec-2015

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Data-Flow Analysis

Page 2: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Approaches

• Static Analysis• Inspections• Dependence analysis• Symbolic execution• Software Verification• Data flow analysis• Concurrency analysis• Interprocedural

analysis

•Dynamic Analysis•Assertions•Error seeding, mutation testing•Coverage criteria•Fault-based testing•Object oriented testing•Regression testing

Page 3: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Data Flow Analysis (DFA)

• Efficient technique for proving properties about programs

• Not as powerful as automated theorem provers, but requires less human expertise

• Uses an annotated control flow graph model of the program

• Compute facts for each node• Use the flow in the graph to compute

facts about the whole program• We’ll focus on single units

Page 4: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Some examples of DFA techniques

• DFA used extensively in program optimization

• e.g., determine if a definition is dead (and can be

removed)

determine if a variable always has a constant value

determine if an assertion is always true and can be

removed

• DFA can also be used to find anomalies in the code• Find def/ref anomalies [Osterweil and Fosdick]• Cecil/Cesar system demonstrated the ability to prove

general user-specified properties [Olender and Osterweil]• FLAVERS demonstrated applicability to concurrent system

[Dwyer and Clarke]

• Why “anomalies” and not faults?• May not correspond to an actual executable failure

Page 5: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Data flow analysis

• First,determine local information that is true at each node in the CFG

• e.g., What variables are defined What variables are referenced

• Usually stored in sets • e.g.,ref(n) is the set of variables

referenced at node n• Second, use this local information and

control flow information to compute global information about the whole program

• Done incrementally by looking at each node’s successors or predecessors

• Use a fixed point algorithm-continue to update global information until a fixed point is reached

Page 6: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Reaching Definitions

• Definition reaches a nodeif there is a def clear path from the definition to that node

• Definition of x at node 1 reaches nodes 2,3,4,5 but not 6

:=x

x:=

5

1

2

3

4

x:=

6

Could be used to determine data dependencies, useful for debugging, data flow testing, etc.

Start from a definition, and move forward on the graph to see how far it reaches

Page 7: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Computing global information- reaching definitions

reaching_def={xi,yj}

3

21reaching_def={xi

}

reaching_def={yj

}

Definitions that might reach a node

reaching_def={yj}

3

21reaching_def={xi,yj}

reaching_def={yj

}

Definitions that must reach a node

Start from a definition, and move forward on the graph to see how far it reaches

Page 8: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Reaching Definitions

• Xi means that the definition of variable x at node i {might|must} reach the current node

• Also stated as {possible|definite} {some|all} {any|all} {may|must}

Page 9: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Computing values for a node

y:= …x

In(n)

Out(n)

Forward flow•Keep track of the definitions into a node that have not been redefined

•Flow out of a node depends on flow in and on what happens in the node

def(n)={y}

ref(n)={x…}

Page 10: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Example: Possible Reaching Definitions

X1Y2

X4

{X1}

{X1,Y2}

{X1,Y2,X4}

{ }

{X1,Y2}

{X1}

{X1,Y2}

{X1,Y2}

{X4,Y2}

{X1,Y2,X4}

int x,y;...x := foo();y := x + 2;if x > 0 then x := x + y;end if;...

x = foo()

y = x + 2

if(x > 0)

x = x + yForward flow,any path problem,

def/ref sets are the initial facts associated with each node

def={x}ref={ }

def={y}ref={x}

def={}ref={x}

def={x}ref={x,y}def={}ref={ }

Page 11: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Example: Definite Reaching Definitions

X1Y2

X4

{X1}

{X1,Y2}

{Y2}

{ }

{X1,Y2}

{X1}

{X1,Y2}

{X1,Y2}

{X4,Y2}

{Y2}

int x,y;...x := foo();y := x + 2;if x > 0 then x := x + y;end if;...

x = foo()

y = x + 2

if(x > 0)

x = x + yForward flow,all path problem,

def/ref sets are the initial facts associated with each node

Page 12: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Example: Definite Reaching Definitions

X1Y2

X4

{X1}

{X1,Y2}

{Y2}

{ }

{X1,Y2}

{X1}

{X1,Y2}

{X1,Y2}

{X4,Y2}

{Y2}

int x,y;...x := foo();y := x + 2;if x > 0 then x := x + y;end if;...

x = foo()

y = x + 2

if(x > 0)

x = x + yForward flow,all path problem,

def/ref sets are the initial facts associated with each node

What happens when we add a loop?

Page 13: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Example: Definite Reaching Definitions

X1Y2

X4

{X1}

{X1,Y2}

{Y2}

{ }

{X1,Y2}

{X1}

{X1,Y2}

{X1,Y2}

{X4,Y2}

{Y2}

int x,y;...x := foo();y := x + 2;if x > 0 then x := x + y;end if;...

x = foo()

y = x + 2

if(x > 0)

x = x + yForward flow,all path problem,

def/ref sets are the initial facts associated with each node

XX

XX

What happens when we add a loop?

Page 14: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Computing values for a node

y:= …x

In(n)

Out(n)

Forward flow•Keep track of the definitions into a node that have not been redefined

•Flow out of a node depends on flow in and on what happens in the node

For definite reaching defs:

In(n) = kєpredOut(k)

Out(n) = In(n)-def(n) U def(nn)

Page 15: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Used to determine what variables need to be kept in a register after executing a node

Live Variables

• a variable, x, is live at node p if there exists a def-clear path wrt x from node p to a use of x

• x is live at 2, 3, 4, but not at node 5

:=x

x:=

5

1

2

3

4x:=6

Start from a use, and move backward on the graph until a def is encountered

Page 16: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Computing global information- live variables

live={x,y} 2

1

live={y}

Possible live variables

43live={x}

2

1

live={x, y}

Definite live variables

43live={x}

live={x}

Page 17: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Live Variables: Computing values for a node

y:= …x

In(n)

Out(n)

Backward flow•Keep track of the (forward) references that have not found their (backward) definition site

•Flow out of a node depends on flow in and on what happens in the node

Page 18: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

{}

{x}

{x,y}

{x,y}

{}Backward flow,any path problem,

def/ref sets are the initial facts associated with each node

Example: Possible Live Variables

{x,y}

{}

{x,y}

{x}

{ }int x,y;...x := foo();y := x + 2;if x > 0 then x := x + y;end if;...

x = foo()

y = x + 2

if(x > 0)

x = x + y

Page 19: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

{}

{x}

{x}

{}

{}

Example: Definite Live Variables

{x,y}

{}

{x}

{x}

{ }int x,y;...x := foo();y := x + 2;if x > 0 then x := x + y;end if;...

x = foo()

y = x + 2

if(x > 0)

x = x + yBackward flow,all path problem,

def/ref sets are the initial facts associated with each node

Page 20: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

{}

{x}

{x}

{}

{}

Example: Definite Live Variables

{x,y}

{}

{x}

{x}

{ }

int x,y;...x := foo();y := x + 2;if x > 0 then x := x + y;end if;...

x = foo()

y = x + 2

if(x > 0)

x = x + y

•Backward flow,all path problem •def/ref sets are the initial facts associated with each node

• In(n) = kєsucc(n)Out(k)Out(n) = In(n)-def(n) U ref(n)

Page 21: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Data Flow Analysis decisions

3

21

Union or Intersection

54

•Backward/Forward

•Any/All

•Facts

•Equations

•Initial values

Page 22: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Constant Propagation

• Some variables at a point in a program may only take on one value

• If we know this, can optimize the code when it is compiled

Page 23: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Constant Propagation

x = 3

y = x + 2

if(x > z)

x = x + y

int x,y;...x := 3;y := x + 2;if x > z then x := x + y;end if;...

Page 24: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Constant Propagation

x = 3

y = x + 2

if(x >z)

x = x + y

U=unknownN= not a constant

Forward flow,all paths problem

Facts are the computations and assignments made at each node

(U,U)

(3,U)

(3,5)

(N,5)(N,5)

(8,5)

(3,5)

int x,y;...x := 3;y := x + 2;if x > z then x := x + y;end if;...

(x,y)

(3,U)

(3,5)

(3,5)

Page 25: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Constant Propagation with a loop

(U,U)

(3,U)

(3,5)

(3,5)

(8,5)

(N,5)

(N,5)

(N,5)

x = 3

y = x + 2

if(x > z)

x = x + y

(3,U)

(3,5)(N,5)

U=unknown

N= not a constant

(3,5) (N,5)

Forward flow,all paths problem

Facts are the computations and assignments made at each node

Page 26: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Fixed point

• The data flow analysis algorithm will eventually terminate

• If there are only a finite number of possible sets that can be associated with a node

• If the function that determines the sets that can be associated with a node is monotonic

Page 27: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Constant Propagation with a loop

(U,U)

(3,U)

(3,5)

(3,5)

(8,5)

(N,5)

(N,5)

(N,5)

x = 3

y = x + 2

if(x > 0)

x = x + y

(3,U)

(3,5)(N,5)

U=unknownN= not a constant

Forward flow,all paths problemFacts are the computations and assignments made at each node

Page 28: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

DAVE: detects anomalous def/ref behavior

• L. D. Fosdick and Leon J. Osterweil, " Data Flow Analysis in Software Reliability", ACM Computing Surveys, September 1976, 8 (3), pp. 306-330.

• Application independent specification of erroneous behavior

• use of an uninitialized variable• redefinition of a variable that is not

referenced

Page 29: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Anomalous pairs of ref/def information

d - defined, r - referenced, u - undefined

d..r: defined variable reaches a reference

u..d: undefined variable reaches a definition

d..d: definition is redefined before being used

d..u: definition is undefined before being used

u..r: undefined variable reaches a reference

Unreferencedefinition

undefinedreference

Page 30: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Consider an unreferenced definition

• For a definition of a, want to know if that definition is not going to be referenced

• Is there some path where a is redefined or undefined before being used?

• May be indicative of a problem• Usually just a programming convenience and not a

problem

• For a definition of a, want to know if on all paths is a redefined or undefined before being used

• May be indicative of a problem• Or could just be wasteful

Page 31: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Some versus All

def a

def a

ref a

def a

def a

For some path, a is redefined without a subsequent reference

For all paths, a is redefined without a subsequent reference

Page 32: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Unreferenced definition: Computing values for a node

y:= …x

In(n)

Out(n)

Forward flow

•Keep track of the definitions into a node that have not been referenced

•Flow out of a node depends on flow in and on what happens in the node

Page 33: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Unreferenced definition: Computing values for a node

y:= …x

In(n)

Out(n)

Forward flow•Keep track of the definitions into a node that have not been referenced

•Flow out of a node depends on flow in and on what happens in the node

Forward flow,all paths problem

In(n) = kєpred(n)Out(k)

Out(n) = In(n)-ref(n) U def(n)

Page 34: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Unreferenced definitions

x = foo()

y = x + 2

if(x > 0)

x = x + 1

y:= zForward flow,all paths problem

{ }

{x}

{y}

{y}

{x,y}

{y}

int x,y;...x := foo();y := x + 2;if x > 0 then x := x + 1;end if;y:= …

(unreferenced defs)

{x}

{y}

{y}

{y}

Need to look at each node where there is a def

Page 35: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Continuing with the unreferenced def example

x = foo()

y = x + 2

if(x > 0)

x = x +1

y:= …

{ }

{x}

{y}

{y}{y}

{x,y}

{y}

{x}

{y}

{y}

•A definiton is redefined without being used if def(n) In(n)-ref(n)

•Must compute this for each node

•For this example, the last node would report a redefinition of y on all paths

•Finds the location where the redef occurs

def={x}ref={ }

def={y}ref={x}

def={}ref={x}

def={x}ref={x}

def={y}ref={ }

Page 36: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Unreferenced definitions, (finds the location of the first def of the pair)

x = foo()

y = x + 2

if(x > 0)

x = x + 1

y:= …Backward flow,all paths problem

{x,y}

{y}

{y} double def

{y}

{y}

{y}

int x,y;...x := foo();y := x + 2;if x > 0 then x := x + 1;end if;y := ...

(unreferenced defs)

{y}

{y}

{y}

{y}

Page 37: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

In values depend on direction

y:= … y:= …

In(n)

In(n)Out(n)

Out(n)

Forward flow Backward flow

Page 38: Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency

Data Flow Analysis Problem

• Need to determine the information that should be computed at a node

• Need to determine how that information should flow from node to node

• Backward or Forward• Union or Intersection

• Often there is more than one way to solve a problem

• Can often be solved forward or backward, but usually one way is easier than the other