spring 2014 program analysis and verification lecture 8: static analysis ii
DESCRIPTION
Spring 2014 Program Analysis and Verification Lecture 8: Static Analysis II. Roman Manevich Ben-Gurion University. Syllabus. Previously. Static Analysis by example Simple Available Expressions analysis Abstract transformer for assignments Three-address code - PowerPoint PPT PresentationTRANSCRIPT
Spring 2014Program Analysis and Verification
Lecture 8: Static Analysis II
Roman ManevichBen-Gurion University
2
Syllabus
Semantics
NaturalSemantics
Structural semantics
AxiomaticVerification
StaticAnalysis
AutomatingHoare Logic
Control Flow Graphs
Equation Systems
CollectingSemantics
AbstractInterpretation fundamentals
Lattices
Galois Connections
Fixed-Points
Widening/Narrowing
Domain constructors
InterproceduralAnalysis
AnalysisTechniques
Numerical Domains
CEGAR
Alias analysis
ShapeAnalysis
Crafting your own
Soot
From proofs to abstractions
Systematically developing
transformers
3
Previously
• Static Analysis by example– Simple Available Expressions analysis– Abstract transformer for assignments– Three-address code– Processing serial composition– Processing conditions– Processing loops
4
Defining an SAV abstract transformer
• Goal: define a function FSAV[x:=a] : s.t.if FSAV[x:=a](D) = D’then sp(x := a, Conj(D)) Conj(D’)
• Idea: define rules for individual factsand generalize to sets of facts by the conjunction rule
Is either a variable v or an addition expression v+w
{ x= } x:=a { }[kill-lhs]
{ y=x+w } x:=a { }[kill-rhs-1]
{ y=w+x } x:=a { }[kill-rhs-2]
{ } x:= { x= }[gen]
{ y=z+w } x:=a { y=z+w }[preserve]
5
Defining a semantic reduction• Idea: make as many implicit facts explicit by
– Using symmetry and transitivity of equality– Commutativity of addition– Meaning of equality – can substitute equal variables
• For an SAV-predicate P=Conj(D) defineExplicate(D) = minimal set D* such that:
1. D D*
2. x=y D* implies y=x D*
3. x=y D* y=z D* implies x=z D*
4. x=y+z D* implies x=z+y D*
5. x=y D* and x=z+w D* implies y=z+w D*
6. x=y D* and z=x+w D* implies z=y+w D*
7. x=z+w D* and y=z+w D* implies x=y D*
• Notice that Explicate(D) D• Explicate is a special case of a semantic reduction
6
Annotating assignments
• Define:F*[x:=aexpr] = Explicate FSAV[x:= aexpr]
• Annotate(P, x:=aexpr) ={P} x:=aexpr F*[x:= aexpr](P)
7
Annotating composition
• Annotate(P, S1; S2) = let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2}
8
Simplifying conditions
• Extend While with– Non-determinism (or) and– An assume statement
assume b, s sos s if B b s = tt • Now, the following two statements are
equivalent– if b then S1 else S2
– (assume b; S1) or (assume b; S2)
9
assume transformer
• Define (bexpr) = if bexpr is factoid {bexpr} else {}
• Define F[assume bexpr](D) = D (bexpr)• Can sharpen
F*[assume bexpr] = Explicate FSAV[assume bexpr]
10
Annotating conditionslet Pt = F*[assume bexpr] Plet Pf = F*[assume bexpr] Plet Annotate(Pt, S1) be {Pt} A1 {Q1}let Annotate(Pf, S2) be {Pf} A2 {Q2}return {P}
if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1 Q2}
11
k-loop unrolling
The following must hold:P NQ1 NQ2 N…Qk N…
{ P }if (x z) x := x + 1 y := x + a d := x + aQ1 = { y=x+a, y=a+x } if (x z) x := x + 1 y := x + a d := x + aQ2 = { y=x+a, y=a+x }
…
{ P }Inv = { N }while (x z) do x := x + 1 y := x + a d := x + a
{ y=x+a, y=a+x, w=d, d=w } if (x z) x := x + 1 y := x + a d := x + aQ1 = { y=x+a, y=a+x }
We can compute the following sequence:N0 = P
N1 = N1 Q1
N2 = N1 Q2
…Nk = Nk-1 Qk
Observation 1: No need to explicitly unroll loop – we can reuse postcondition from unrolling k-1 for k
12
Annotating loopsAnnotate(P, while bexpr do S) = Initialize N := Nc := P
repeat let Annotate(P, if b then S else skip) be {Nc} if bexpr then S else skip {N} Nc := Nc N until N = Nc
return {P} INV= N while bexpr do F[assume bexpr](N) Annotate(F[assume bexpr](N), S) F[assume bexpr](N)
13
Annotating programsAnnotate(P, S) = case S is x:=aexpr return {P} x:=aexpr {F*[x:=aexpr] P} case S is S1; S2
let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2} case S is if bexpr then S1 else S2
let Pt = F[assume bexpr] P let Pf = F[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1}
else {Pf} A2 {Q2} {Q1 Q2}
case S is while bexpr do S N := Nc := P // Initialize repeat
let Pt = F[assume bexpr] Nc
let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N
until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)}
14
Today
• Another static analysis example – constant propagation
• Basic concepts in static analysis– Control flow graphs– Equation systems– Collecting semantics– (Trace semantics)
15
Constant propagation
16
Second static analysis example
• Optimization: constant folding
– Example: x:=7; y:=x*9transformed to: x:=7; y:=7*9and then to: x:=7; y:=63
• Analysis: constant propagation (CP)– Infers facts of the form x=c
{ x=c }y := aexpr y := eval(aexpr[c/x])
constantfolding
simplifies constant expressions
17
Plan
• Define domain – set of allowed assertions• Handle assignments• Handle composition• Handle conditions• Handle loops
18
Constant propagationdomain
19
CP semantic domain
?
20
CP semantic domain
• Define CP-factoids: = { x = c | x Var, c Z }– How many factoids are there?
• Define predicates as = 2
– How many predicates are there?– Do all predicates make sense? (x=5) (x=7)
• Treat conjunctive formulas as sets of factoids{x=5, y=7} ~ (x=5) (y=7)
21
Handling assignments
22
CP abstract transformer
• Goal: define a functionFCP[x:=aexpr] : such thatif FCP[x:=aexpr] P = P’ then sp(x:=aexpr, P) P’
?
23
CP abstract transformer
• Goal: define a functionFCP[x:=aexpr] : such thatif FCP[x:=aexpr] P = P’ then sp(x:=aexpr, P) P’
{ x=c } x:=aexpr { }[kill]
{ y=c1, z=c2 } x:=y op z { x=c} and c=c1 op c2[gen-2]
{ } x:=c { x=c }[gen-1]
{ y=c } x:=aexpr { y=c }[preserve]
24
Gen-kill formulation of transformers• Suited for analysis propagating sets of factoids– Available expressions,– Constant propagation, etc.
• For each statement, define a set of killed factoids and a set of generated factoids
F[S] P = (P \ kill(S)) gen(S)• FCP[x:=aexpr] P = (P \ {x=c})
aexpr is not a constant• FCP[x:=k] P = (P \ {x=c}) {x=k}• Used in dataflow analysis – a special case of abstract
interpretation
25
Handling composition
26
Does this still work?
Annotate(P, S1; S2) = let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2}
27
Handling conditions
28
Handling conditional expressions
• We want to soundly approximate D bexpr and D bexpr in
• Define (bexpr) = if bexpr is CP-factoid {bexpr} else {}
• Define F[assume bexpr](D) = D (bexpr)
29
Does this still work?let Pt = F[assume bexpr] Plet Pf = F[assume bexpr] Plet Annotate(Pt, S1) be {Pt} A1 {Q1}let Annotate(Pf, S2) be {Pf} A2 {Q2}return {P}
if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1 Q2}
How do we define join for CP?
30
Join example
• {x=5, y=7} {x=3, y=7, z=9} =
31
Handling loops
32
Does this still work?
• What about correctness?• What about termination?
Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc
let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)}
33
Does this still work?
• What about correctness?– If loop terminates then is N a loop invariant?
• What about termination?
Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc
let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)}
34
A termination principle
• g : X X is a function• How can we determine whether the sequence
x0, x1 = g(x0), …, xk+1=g(xk),… stabilizes?• Technique:
1. Find ranking function rank : X N(that is show that rank(x) 0 for all x)
2. Show that if xg(x)then rank(g(x)) < rank(x)
35
Rank function for available expressions
• rank(P) = ?
36
Rank function for available expressions
• rank(P) = |P|number of factoids
• Prove that either Nc = Nc Nor rank(Nc N) <? rank(Nc)
Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc
let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)}
37
Rank function for constant propagation
• rank(P) = ?
• Prove that either Nc = Nc Nor rank(Nc) >? rank(Nc N)
Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc
let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)}
38
Rank function for constant propagation
• rank(P) = |P|number of factoids
• Prove that either Nc = Nc N’or rank(Nc) >? rank(Nc N’)
Annotate(P, while bexpr do S) = N’ := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc
let Annotate(Pt, S) be {Nc} Abody {N’} Nc := Nc N’ until N’ = Nc return {P} INV= {N’} while bexpr do {Pt} Abody {F[assume bexpr](N)}
39
Generalizing
By NMZ (Photoshop) [CC0], via Wikimedia Commons
1
AvailableExpressions
ConstantPropagation
AbstractInterpretation
40
Towards a recipe for static analysis
• Two static analyses– Available Expressions (extended with equalities)– Constant Propagation
• Semantic domain – a family of formulas– Join operator approximates pairs of formulas
• Abstract transformers for basic statements– Assignments– assume statements
• Initial precondition
41
Controlflow
graphs
42
A technical issue• Unrolling loops is quite inconvenient and
inefficient (but we can avoid it as we just saw)• How do we handle more complex control-flow
constructs, e.g., goto , break, exceptions…?– The problem: non-inductive control flow constructs
• Solution: model control-flow by labels and goto statements
• Would like a dedicated data structure to explicitly encode control flow in support of the analysis
• Solution: control-flow graphs (CFGs)
43
Modeling control flow with labels
while (x z) do x := x + 1 y := x + a d := x + aa := b
label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0
label1: a := b
Control-flow graph example
44
1 label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0
label1: a := b
2345
78
6
label0:
if x z
x := x + 1
y := x + a
d := x + a
goto label0
label1:
a := b
1
2
3
4
5
6
7
8
line number
Control-flow graph example
45
1 label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0
label1: a := b
2345
78
6
label0:
if x z
x := x + 1
y := x + a
d := x + a
goto label0
label1:
a := b
1
2
3
4
5
6
8
entry
exit
7
46
Control-flow graph• Node are statements or labels• Special nodes for entry/exit• A edge from node v to node w means that after
executing the statement of v control passes to w– Conditions represented by splits and join node– Loops create cycles
• Can be generated from abstract syntax tree in linear time– Automatically taken care of by the front-end
• Usage: store analysis results (assertions) in CFG nodes
Control-flow graph example
47
1 label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0
label1: a := b
2345
78
6
label0:
if x z
x := x + 1
y := x + a
d := x + a
goto label0
label1:
a := b
1
2
3
4
5
6
7
8
entry
exit
48
Eliminating labels
• We can use edges to point to the nodes following labels and remove all label nodes (other than entry/exit)
Control-flow graph example
49
1 label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0
label1: a := b
2345
78
6
label0:
if x z
x := x + 1
y := x + a
d := x + a
goto label0
label1:
a := b
1
2
3
4
5
6
7
8
entry
exit
Control-flow graph example
50
1 label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0
label1: a := b
2345
78
6if x z
x := x + 1
y := x + a
d := x + a
a := b
2
3
4
5
8
entry
exit
51
Basic blocks
• A basic block is a chain of nodes with a single entry point and a single exit point
• Entry/exit nodes are separate blocks
if x z
x := x + 1
y := x + a
d := x + a
a := b
2
3
4
5
8
entry
exit
52
Blocked CFG
• Stores basic blocks in a single node• Extended blocks – maximal connected loop-
free subgraphs
if x z
x := x + 1y := x + ad := x + aa := b
2
3
8
entry
exit
45
53
Collecting semantics
54
Why need another semantics?
• Operational semantics explains how to compute output from a given input– Useful for implementing an interpreter/compiler– Less useful for reasoning about safety properties– Not suitable for computational purposes – does
not explicitly show how assertions in different program points influence each other
• Need a more explicit semantics– Over a control flow graph
Control-flow graph example
1234
5if x > 0
x := x - 1
goto label0:label1:
2
3
45
entry
exit
label0:1
55
label0: if x <= 0 goto label1 x := x – 1 goto label0
label1:
Trimmed CFG
1234
5
if x > 0
x := x - 1
2
3
entry
exit
56
label0: if x <= 0 goto label1 x := x – 1 goto label0
label1:
Collecting semantics example: input 1
1234
5
if x > 0
x := x - 1
2
3
entry
exit
[x1]
[x1]
[x0]
[x0]
57
[x1][x2][x3]…label0: if x <= 0 goto label1 x := x – 1 goto label0
label1:
Collecting semantics example: input 2
1234
5
if x > 0
x := x - 1
2
3
entry
exit
[x1]
[x1]
[x0][x2]
[x2]
58
[x1][x2][x3]…label0: if x <= 0 goto label1 x := x – 1 goto label0
label1:
[x0]
Collecting semantics example: input 3
1234
5
if x > 0
x := x - 1
2
3
entry
exit
[x1]
[x1]
[x0][x2]
[x2]
[x3]
[x3]
59
[x1][x2][x3]…label0: if x <= 0 goto label1 x := x – 1 goto label0
label1:
[x0]
ad infinitum – fixed point
1234
5
if x > 0
x := x - 1
2
3
entry
exit
[x1]
[x1]
[x1]
[x0]
[x2]
[x2]
[x2]
[x3]
[x3]
[x3]
…
…
…60
label0: if x <= 0 goto label1 x := x – 1 goto label0
label1:
[x-1][x-2]…[x0]
Predicates at fixed point
1234
5
if x > 0
x := x - 1
2
3
entry
exit
61
label0: if x <= 0 goto label1 x := x – 1 goto label0
label1:
{true}
{?}
{?}{?}
Predicates at fixed point
1234
5
if x > 0
x := x - 1
2
3
entry
exit
62
label0: if x <= 0 goto label1 x := x – 1 goto label0
label1:
{true}
{true}
{x>0}{x0} {x0}
63
Collecting semantics
• Accumulates for each control-flow node the (possibly infinite) sets of states that can reach there by executing the program from some given set of input states
• Not computable in general• A reference point for static analysis• (An abstraction of the trace semantics)• We will work our way up to defining it
formally
64
Collecting semanticsin equational form
65
Math reference: function lifting
• Let f : X Y be a function• The lifted function f’ : 2X 2Y
is defined as f’(XS) = { f(x) | x XS }• We will sometimes use the same symbol for
both functions when it is clear from the context which one is used
66
Equational definition example• A vector of variables R[0, 1, 2, 3, 4]• R[0] = {xZ} // established input
R[1] = R[0] R[4]R[2] = R[1] {s | s(x) > 0}R[3] = R[1] {s | s(x) 0}R[4] = x:=x-1 R[2]
• A (recursive) system of equations
if x > 0
x := x-1
entry
exit
R[0]
R[1]
R[2]R[4]
R[3]
Semantic function for assume x>0
Semantic function for x:=x-1 lifted to sets of states
67
General definition• A vector of variables R[0, …, k] one per input/output of a node
– R[0] is for entry• For node n with multiple predecessors add equation
R[n] = {R[k] | k is a predecessor of n}• For an atomic operation node R[m] S R[n] add equation
R[n] = S R[m]
• Transform if b then S1 else S2
to (assume b; S1) or (assume b; S2)
if x > 0
x := x-1
entry
exit
R[0]
R[1]
R[2]R[4]
R[3]
Next lecture:abstract interpretation fundamentals