symbolic executionloris/cs703/cs703material/se.pdfsubhajit roy (iit kanpur) symbolic execution...
TRANSCRIPT
Symbolic Execution
Subhajit Roy
Indian Institute of Technology Kanpur
November 8, 2019
Introduction
Outline
1 Introduction
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 2 / 36
Introduction
Correctness of Software
Program Verification (sound, incomplete, infinite inputs)
Program Testing1 (unsound, complete, finite inputs)
Symbolic Execution (sound, incomplete, infinite inputs)
1The testing community often uses different definitions of soundness and completenessSubhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 3 / 36
Introduction
Correctness of Software
Program Verification (sound, incomplete, infinite inputs)
Program Testing1 (unsound, complete, finite inputs)
Symbolic Execution (sound, incomplete, infinite inputs)
1The testing community often uses different definitions of soundness and completenessSubhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 3 / 36
Introduction
Correctness of Software
Program Verification (sound, incomplete, infinite inputs)
Program Testing1 (unsound, complete, finite inputs)
Symbolic Execution (sound, incomplete, infinite inputs)
1The testing community often uses different definitions of soundness and completenessSubhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 3 / 36
Introduction
Symbolic Execution
Analyse this
What inputs cause this program to violate the assertion?
1 i n t main ( ) {
3 i n p u t ( a , b , c , d ) ;i f ( a <= b ){
5 c++;}
7 e l s e {d++;
9 i f ( c == 2∗d )a s s e r t ( a > d )
11 }}
13
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 4 / 36
Introduction
Symbolic Execution
Analyze this
OK, let’s answer this!
A customer buys 5 hot dogs and 5 bags of potato chips for $12.50. Anothercustomer buys 3 hot dogs and 4 bags of potato chips for $8.25. Find the costof each item.a
== Use symbols to represent unknowns! ==
ahttps://www.wyzant.com/resources/answers/107505/fond the cost of each item
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 5 / 36
Introduction
Symbolic Execution
Analyze this
OK, let’s answer this!
A customer buys 5 hot dogs and 5 bags of potato chips for $12.50. Anothercustomer buys 3 hot dogs and 4 bags of potato chips for $8.25. Find the costof each item.a
== Use symbols to represent unknowns! ==
ahttps://www.wyzant.com/resources/answers/107505/fond the cost of each item
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 5 / 36
Introduction
Symbolic Execution
Analyze this
OK, let’s answer this!
A customer buys 5 hot dogs and 5 bags of potato chips for $12.50. Anothercustomer buys 3 hot dogs and 4 bags of potato chips for $8.25. Find the costof each item.a
== Use symbols to represent unknowns! ==
ahttps://www.wyzant.com/resources/answers/107505/fond the cost of each item
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 5 / 36
Introduction
Symbolic Execution
Simple idea
Execute a program with symbolic inputs!
i n t main ( ) {2
i n p u t ( a , b , c , d ) ;4 i f ( a <= b ){
c++;6 }
e l s e {8 d++;
i f ( c == 2∗d )10 a s s e r t ( a > d )
}12 }
VariableName SymbolicNamea α0b α1c α2d α3
Analyze the Path Condition
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 6 / 36
Introduction
Symbolic Execution
Simple idea
Execute a program with symbolic inputs!
i n t main ( ) {2
i n p u t ( a , b , c , d ) ;4 i f ( a <= b ){
c++;6 }
e l s e {8 d++;
i f ( c == 2∗d )10 a s s e r t ( a > d )
}12 }
VariableName SymbolicNamea α0b α1c α2d α3
Analyze the Path Condition
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 6 / 36
Introduction
Symbolic Execution
Simple idea
Execute a program with symbolic inputs!
i n t main ( ) {2
i n p u t ( a , b , c , d ) ;4 i f ( a <= b ){
c++;6 }
e l s e {8 d++;
i f ( c == 2∗d )10 a s s e r t ( a > d )
}12 }
VariableName SymbolicNamea α0b α1c α2d α3
Analyze the Path Condition
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 6 / 36
Introduction
Symbolic Execution
Simple idea
Execute a program with symbolic inputs!
i n t main ( ) {2
i n p u t ( a , b , c , d ) ;4 i f ( a <= b ){
c++;6 }
e l s e {8 d++;
i f ( c == 2∗d )10 a s s e r t ( a > d )
}12 }
VariableName SymbolicNamea α0b α1c α2d α3
Analyze the Path Condition
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 6 / 36
Introduction
Bounded Model Checking v/s Symbolic Execution
BMC
φk = STEP (X0, X1) ∧ STEP (X1, X2) · · · ∧ STEP (Xk−1, Xk)
SE
φk = STEP (X0, X1) ∧ STEP (X1, X2) · · · ∧ STEP (Xk−1, Xk)
... but on a single path!
SE can be seen as performing BMC on each path in isolation.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 7 / 36
Introduction
Bounded Model Checking v/s Symbolic Execution
BMC
φk = STEP (X0, X1) ∧ STEP (X1, X2) · · · ∧ STEP (Xk−1, Xk)
SE
φk = STEP (X0, X1) ∧ STEP (X1, X2) · · · ∧ STEP (Xk−1, Xk)
... but on a single path!
SE can be seen as performing BMC on each path in isolation.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 7 / 36
Introduction
Bounded Model Checking v/s Symbolic Execution
BMC
φk = STEP (X0, X1) ∧ STEP (X1, X2) · · · ∧ STEP (Xk−1, Xk)
SE
φk = STEP (X0, X1) ∧ STEP (X1, X2) · · · ∧ STEP (Xk−1, Xk)
... but on a single path!
SE can be seen as performing BMC on each path in isolation.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 7 / 36
Introduction
Gaining Coverage
How to explore multiple (potentially, all) paths:
Concolic execution Concrete execution on random input, collectconstraints, edit constraints, solve for new path
EGT (Execution generated testing) Symbolically execute, fork at branches
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 8 / 36
Introduction
Gaining Coverage
How to explore multiple (potentially, all) paths:
Concolic execution Concrete execution on random input, collectconstraints, edit constraints, solve for new path
EGT (Execution generated testing) Symbolically execute, fork at branches
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 8 / 36
Introduction
EGT
i n t main ( ) {2
i n p u t ( a , b , c , d ) ;4 s y m b o l i c ( a , b , c , d ) ;
i f ( a <= b ){6 c++;
i f ( c <= d )8 p r i n t f ( ” Hi\n” ) ;
}10 e l s e {
d++;12 i f ( c∗c == d )
p r i n t f ( ”Bye\n” ) ;14 }}
16
VariableName SymbolicNamea α0b α1c α2d α3
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 9 / 36
Introduction
EGT
1 i n t main ( ) {
3 i n p u t ( a , b , c , d ) ;s y m b o l i c ( a , b , c , d ) ;
5 i f ( a <= b ){c++;
7 i f ( c <= d )p r i n t f ( ” Hi\n” ) ;
9 }e l s e {
11 d++;i f ( c∗c == d )
13 p r i n t f ( ”Bye\n” ) ;}
15 }
VariableName SymbolicNamea α0b α1c α2d α3
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 9 / 36
Introduction
EGT
Covering behaviors
Fork at each branch if both sides are feasible (use an SMT solver)
1 i n t main ( ) {
3 i n p u t ( a , b , c , d ) ;s y m b o l i c ( a , b , c , d ) ;
5 −→ i f ( a <= b){c++;
7 i f ( c <= d )p r i n t f ( ” Hi\n” ) ;
9 }e l s e {
11 d++;i f ( c∗c == d )
13 p r i n t f ( ”Bye\n” ) ;}
15 }
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 10 / 36
Introduction
EGT
Covering behaviors
Fork at each branch if both sides are feasible (use an SMT solver)
1 i n t main ( ) {
3 i n p u t ( a , b , c , d ) ;s y m b o l i c ( a , b , c , d ) ;
5 i f ( a <= b ){c++;
7 −→ i f ( c <= d)p r i n t f ( ” Hi\n” ) ;
9 }e l s e {
11 d++;−→ i f ( c*c == d)
13 p r i n t f ( ”Bye\n” ) ;}
15 }
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 11 / 36
Introduction
EGT
Generating test
Solve the path condition (PC) to synthesize testcases
1 i n t main ( ) {i n t main ( ) {
3
i n p u t ( a , b , c , d ) ;5 s y m b o l i c ( a , b , c , d ) ;
i f ( a <= b ){7 c++;
i f ( c <= d )9 p r i n t f ( ” Hi\n” ) ;
}11 e l s e {
d++;13 i f ( c∗c == d )
p r i n t f ( ”Bye\n” ) ;15 }−→}
17
Path PC Assignment
1 (α0 <= α1) ∧ (α2 + 1 <= α3) (0,1,2,3)
2 (α0 <= α1) ∧ (α2 + 1 > α3) (0,1,4,3)
3 (α0 > α1) ∧ (α2 ∗ α2 == α3 + 1) (1,0,2,3)
4 (α0 > α1) ∧ (α2 ∗ α2 6= α3 + 1) (1,0,0,0)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 12 / 36
Introduction
EGT
Generating test
Solve the path condition (PC) to synthesize testcases
18 i n t main ( ) {i n t main ( ) {
20
i n p u t ( a , b , c , d ) ;22 s y m b o l i c ( a , b , c , d ) ;
i f ( a <= b ){24 c++;
i f ( c <= d )26 p r i n t f ( ” Hi\n” ) ;
}28 e l s e {
d++;30 i f ( c∗c == d )
p r i n t f ( ”Bye\n” ) ;32 }−→}
34
Path PC Assignment
1 (α0 <= α1) ∧ (α2 + 1 <= α3) (0,1,2,3)
2 (α0 <= α1) ∧ (α2 + 1 > α3) (0,1,4,3)
3 (α0 > α1) ∧ (α2 ∗ α2 == α3 + 1) (1,0,2,3)
4 (α0 > α1) ∧ (α2 ∗ α2 6= α3 + 1) (1,0,0,0)Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 12 / 36
Introduction
Concolic Testing
1 Run program on a random input
2 Collect the path condition as the program executes
3 Modify the path condition when program terminates
4 Solve modified path condition to generate new input for the next run of theprogram
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 13 / 36
Introduction
Handling real-world programs
external function calls
vector instructions
system calls
floating-point instructions
non-linear arithmetic ...
== Concretization and Virtualization ==
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 14 / 36
Introduction
Handling real-world programs
external function calls
vector instructions
system calls
floating-point instructions
non-linear arithmetic ...
== Concretization and Virtualization ==
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 14 / 36
Introduction
Concretization
i n t main ( ) {36 r e a d ( x ) ;
i f ( x > 0) {38 y = foo ( x );
i f ( y > 150)40 p r i n t ( ” l e s s ” ) ;
i f ( y > 250)42 a s s e r t ( 0 ) ;
i f ( y != a )44 a s s e r t ( 0 ) ;
i f ( y < 0)46 a s s e r t ( 0 ) ;
}48 }
Possible solutions
Overapproxy ← ∗
Underapprox (concretization)y ← 0
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 15 / 36
Introduction
Concretization
50 i n t main ( ) {r e a d ( x ) ;
52 i f ( x > 0) {y = foo ( x );
54 i f ( y > 150)p r i n t ( ” l e s s ” ) ;
56 i f ( y > 250)a s s e r t ( 0 ) ;
58 i f ( y != a )a s s e r t ( 0 ) ;
60 i f ( y < 0)a s s e r t ( 0 ) ;
62 }}
64
Possible solutions
Overapproxy ← ∗
Underapprox (concretization)y ← 0
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 15 / 36
Introduction
Concretization
i n t main ( ) {66 r e a d ( x ) ;
i f ( x > 0) {68 y = foo ( x );
i f ( y > 150)70 p r i n t ( ” l e s s ” ) ;
i f ( y > 250)72 a s s e r t ( 0 ) ;
i f ( y != a )74 a s s e r t ( 0 ) ;
i f ( y < 0)76 a s s e r t ( 0 ) ;
}78 }
Possible solutions
Overapproxy ← ∗
Underapprox (concretization)y ← 0
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 15 / 36
Introduction
Concretization
80 i n t main ( ) {r e a d ( x ) ;
82 i f ( x > 0) {y = foo ( x );
84 i f ( y > 150)p r i n t ( ” l e s s ” ) ;
86 i f ( y > 250)a s s e r t ( 0 ) ;
88 i f ( y != a )a s s e r t ( 0 ) ;
90 i f ( y < 0)a s s e r t ( 0 ) ;
92 }}
94
Possible solutions
Overapproxy ← ∗
Underapprox (concretization)y ← 0
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 15 / 36
Introduction
Problems with concretization
i n t main ( ) {96 r e a d ( x ) ;
i f ( x > 0){98 y = foo(x);
i f ( y > 150)100 p r i n t ( ” l e s s ” ) ;
i f ( y > 250)102 a s s e r t ( 0 ) ;
i f ( y != x )104 a s s e r t ( 0 ) ;
i f ( y < 0)106 a s s e r t ( 0 ) ;
}108 }
Loss in coverage! (unsound)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 16 / 36
Introduction
Problems with concretization
110 i n t main ( ) {r e a d ( x ) ;
112 i f ( x > 0){y = foo(x);
114 i f ( y > 150)p r i n t ( ” l e s s ” ) ;
116 i f ( y > 250)a s s e r t ( 0 ) ;
118 i f ( y != x )a s s e r t ( 0 ) ;
120 i f ( y < 0)a s s e r t ( 0 ) ;
122 }}
124
Loss in coverage! (unsound)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 16 / 36
Introduction
Problems with concretization
i n t main ( ) {126 r e a d ( x ) ;
i f ( x > 0){128 y = foo(x);
i f ( y > 150)130 p r i n t ( ” l e s s ” ) ;
i f ( y > 250)132 a s s e r t ( 0 ) ;
i f ( y != x )134 a s s e r t ( 0 ) ;
i f ( y < 0)136 a s s e r t ( 0 ) ;
}138 }
Loss in coverage! (unsound)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 16 / 36
Introduction
Problems with concretization
140 i n t main ( ) {r e a d ( x ) ;
142 i f ( x >= 0) {→ y = abs(x);
144 i f ( x > 100)i f ( y != x )
146 a s s e r t ( 0 ) ;}
148 }
False positive! (incompleteness—in a testing tool?)Is it a bug? (no, a conscious design decision)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 17 / 36
Introduction
Problems with concretization
150 i n t main ( ) {r e a d ( x ) ;
152 i f ( x >= 0) {→ y = abs(x);
154 i f ( x > 100)i f ( y != x )
156 a s s e r t ( 0 ) ;}
158 }
False positive! (incompleteness—in a testing tool?)Is it a bug? (no, a conscious design decision)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 17 / 36
Introduction
Problems with concretization
160 i n t main ( ) {r e a d ( x ) ;
162 i f ( x >= 0) {→ y = abs(x);
164 i f ( x > 100)i f ( y != x )
166 a s s e r t ( 0 ) ;}
168 }
False positive! (incompleteness—in a testing tool?)
Is it a bug? (no, a conscious design decision)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 17 / 36
Introduction
Problems with concretization
170 i n t main ( ) {r e a d ( x ) ;
172 i f ( x >= 0) {→ y = abs(x);
174 i f ( x > 100)i f ( y != x )
176 a s s e r t ( 0 ) ;}
178 }
False positive! (incompleteness—in a testing tool?)Is it a bug? (no, a conscious design decision)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 17 / 36
Introduction
Problems with concretization
180 i n t main ( ) {r e a d ( x ) ;
182 i f ( x >= 0){→ y = abs ( x ) ;
184 i f ( x > 100)i f ( y != x )
186 a s s e r t ( 0 ) ;}
188 }
Reproducibility (of tests)?
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 18 / 36
Introduction
Problems with concretization
190 i n t main ( ) {r e a d ( x ) ;
192 i f ( x >= 0){→ y = abs ( x ) ;
194 i f ( x > 100)i f ( y != x )
196 a s s e r t ( 0 ) ;}
198 }
Reproducibility (of tests)?
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 18 / 36
Introduction
Question
Can we regain soundness, completeness and reproducibility lost due toconcretizations?
Pandey, Kotcharlakota and Roy. Deferred concretization in symbolic execution via
fuzzing. ISSTA 2019.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 19 / 36
Introduction
Question
Can we regain soundness, completeness and reproducibility lost due toconcretizations?
Pandey, Kotcharlakota and Roy. Deferred concretization in symbolic execution via
fuzzing. ISSTA 2019.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 19 / 36
Introduction
Modeling heap memory
1 i n t main ( ) {r e a d ( x ) ;
3 a = m a l l o c ( 1 0 0 ) ;c l e a r ( a ) ;
5 a [ x ] = 5a s s e r t ( a [ x ] != a [2∗ x − 2 ] ) ;
7 }
1 i n t main ( ) {r e a d ( x ) ;
3 a = Ha ;f o r ( i n t i =0; i <100; i ++) w r i t e (Ha , i , 0) ;
5 w r i t e (Ha , x , 5) ;a s s e r t ( r e a d (Ha , x ) != r e a d (Ha , 2∗x − 2) ) ;
7 }
== Use the array theory to model H ==
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 20 / 36
Introduction
Modeling heap memory
i n t main ( ) {10 r e a d ( x ) ;
a = m a l l o c ( 1 0 0 ) ;12 c l e a r ( a ) ;
a [ x ] = 514 a s s e r t ( a [ x ] != a [2∗ x − 2 ] ) ;}
16
i n t main ( ) {10 r e a d ( x ) ;
a = Ha ;12 f o r ( i n t i =0; i <100; i ++) w r i t e (Ha , i , 0) ;
w r i t e (Ha , x , 5) ;14 a s s e r t ( r e a d (Ha , x ) != r e a d (Ha , 2∗x − 2) ) ;}
16
== Use the array theory to model H ==
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 20 / 36
Introduction
Modeling heap memory
i n t main ( ) {18 r e a d ( x ) ;
a = m a l l o c ( 1 0 0 ) ;20 c l e a r ( a ) ;
a [ x ] = 522 a s s e r t ( a [ x ] != a [2∗ x − 2 ] ) ;}
24
i n t main ( ) {18 r e a d ( x ) ;
a = Ha ;20 f o r ( i n t i =0; i <100; i ++) w r i t e (Ha , i , 0) ;
w r i t e (Ha , x , 5) ;22 a s s e r t ( r e a d (Ha , x ) != r e a d (Ha , 2∗x − 2) ) ;}
24
== Use the array theory to model H ==
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 20 / 36
Introduction
Array theory
Axioms
∀a ∀i ∀v (read(write(a, i, v), i) = v)
∀a ∀i ∀j ∀v (i 6= j → read(write(a, i, v), j) = read(a, j))
∀a ∀b ((∀i (read(a, i) = read(b, i)))→ a = b
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 21 / 36
Introduction
Virtualization
S2E2 enables symbolic execution for binaries running in a virtualized environment(QEMU), enabling whole system verification.
2https://s2e.systemsSubhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 22 / 36
Introduction
Applications of Symbolic Execution
Test-case generation (path coverage)
Program debugging [Chandra et al., ICSE 2011]
Program repair [Nguyen et al., ICSE 2013; Mechtaev et al. ICSE 2016]
Bucketing tests [Pham et al., FASE 2017]
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 23 / 36
Introduction
Applications of Symbolic Execution
Test-case generation (path coverage)
Program debugging [Chandra et al., ICSE 2011]
Program repair [Nguyen et al., ICSE 2013; Mechtaev et al. ICSE 2016]
Bucketing tests [Pham et al., FASE 2017]
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 23 / 36
Introduction
Applications of Symbolic Execution
Test-case generation (path coverage)
Program debugging [Chandra et al., ICSE 2011]
Program repair [Nguyen et al., ICSE 2013; Mechtaev et al. ICSE 2016]
Bucketing tests [Pham et al., FASE 2017]
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 23 / 36
Introduction
Applications of Symbolic Execution
Test-case generation (path coverage)
Program debugging [Chandra et al., ICSE 2011]
Program repair [Nguyen et al., ICSE 2013; Mechtaev et al. ICSE 2016]
Bucketing tests [Pham et al., FASE 2017]
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 23 / 36
Introduction
Angelic Degugging [Chandra et al. ICSE ’11]
Objective
Given a informal specification as a set of tests, can we identify which expressionsare ikely to be buggy?
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 24 / 36
Introduction
Angelic Degugging [Chandra et al. ICSE ’11]
200 i n t main ( ) {r e a d ( x , y , z ) ;
202 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
204 t3 = ( z >= x ) ;
206 i f ( t3 && t2 ) max = z ;i f ( t1 && ! t3 ) max = x ;
208 i f ( t2 && ! t1 ) max = y ;
210 output ( max ) ;}
212
What is the bug?
How to fix the bug?
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 25 / 36
Introduction
Angelic Degugging [Chandra et al. ICSE ’11]
i n t main ( ) {214 r e a d ( x , y , z ) ;
t1 = ( x >= y ) ;216 t2 = ( y >= z ) ;
t3 = ( z >= x ) ;218
i f ( t3 && t2 ) max = z ;220 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;222
output ( max ) ;224 }
What is the bug?
How to fix the bug?
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 25 / 36
Introduction
Angelic Degugging [Chandra et al. ICSE ’11]
226 i n t main ( ) {r e a d ( x , y , z ) ;
228 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
230 t3 = ( z >= x ) ;
232 i f ( t3 && t2 ) max = z ;i f ( t1 && ! t3 ) max = x ;
234 i f ( t2 && ! t1 ) max = y ;
236 output ( max ) ;}
238
What is the bug?
How to fix the bug?
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 25 / 36
Introduction
Angelic Degugging [Chandra et al. ICSE ’11]
i n t main ( ) {240 r e a d ( x , y , z ) ;
t1 = ( x >= y ) ;242 t2 = ( y >= z ) ;
t3 = ( z >= x ) ;244
i f ( t3 && t2 ) max = z ; // bug246 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;248
output ( max ) ;250 }
Test Input Output StatusI1 8, 2, 4 8 PassI2 1, 2, 4 0 FailI3 7, 5, 4 7 PassI4 2, 5, 1 5 Pass
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 26 / 36
Introduction
Angelic Degugging [Chandra et al. ICSE ’11]
252 i n t main ( ) {r e a d ( x , y , z ) ;
254 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
256 t3 = ( z >= x ) ;
258 i f ( t3 && t2 ) max = z ; // bugi f ( t1 && ! t3 ) max = x ;
260 i f ( t2 && ! t1 ) max = y ;
262 output ( max ) ;}
264
Test Input Output StatusI1 8, 2, 4 8 PassI2 1, 2, 4 0 FailI3 7, 5, 4 7 PassI4 2, 5, 1 5 Pass
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 26 / 36
Introduction
Approach
Define scope of debugging
E = All expressions (and subexpressions) within a user-specified scope.Expressions at distinct program locations are different expressions.
Test for angelic values on failing tests
∀e ∈ E : AngelicTest(P, I, e) = ∃α.Test(P [α/e], I)
Check for regressions on passing tests
∀e ∈ E : FlexTest(P, I, e) = ∃α.(Test(P [α/e], I) ∧ α 6= Eval(P, I, e)
Collect suspicious expressions
{e | e ∈ E ∧AngelicTest(P, If , e) ∧ ∀i ∈ [i..k].F lexTest(P, Ipi , e)}
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 27 / 36
Introduction
Approach
Define scope of debugging
E = All expressions (and subexpressions) within a user-specified scope.Expressions at distinct program locations are different expressions.
Test for angelic values on failing tests
∀e ∈ E : AngelicTest(P, I, e) = ∃α.Test(P [α/e], I)
Check for regressions on passing tests
∀e ∈ E : FlexTest(P, I, e) = ∃α.(Test(P [α/e], I) ∧ α 6= Eval(P, I, e)
Collect suspicious expressions
{e | e ∈ E ∧AngelicTest(P, If , e) ∧ ∀i ∈ [i..k].F lexTest(P, Ipi , e)}
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 27 / 36
Introduction
Approach
Define scope of debugging
E = All expressions (and subexpressions) within a user-specified scope.Expressions at distinct program locations are different expressions.
Test for angelic values on failing tests
∀e ∈ E : AngelicTest(P, I, e) = ∃α.Test(P [α/e], I)
Check for regressions on passing tests
∀e ∈ E : FlexTest(P, I, e) = ∃α.(Test(P [α/e], I) ∧ α 6= Eval(P, I, e)
Collect suspicious expressions
{e | e ∈ E ∧AngelicTest(P, If , e) ∧ ∀i ∈ [i..k].F lexTest(P, Ipi , e)}
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 27 / 36
Introduction
Approach
Define scope of debugging
E = All expressions (and subexpressions) within a user-specified scope.Expressions at distinct program locations are different expressions.
Test for angelic values on failing tests
∀e ∈ E : AngelicTest(P, I, e) = ∃α.Test(P [α/e], I)
Check for regressions on passing tests
∀e ∈ E : FlexTest(P, I, e) = ∃α.(Test(P [α/e], I) ∧ α 6= Eval(P, I, e)
Collect suspicious expressions
{e | e ∈ E ∧AngelicTest(P, If , e) ∧ ∀i ∈ [i..k].F lexTest(P, Ipi , e)}
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 27 / 36
Introduction
Angelic non-determinism
i n t main ( ) {266 r e a d ( x , y , z ) ;
t1 = ( x >= y ) ;268 t2 = ( y >= z ) ;
t3 = ( z >= x )270
i f (∗ ) max = z ; // a n g e l i c272 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;274
output ( max ) ;276 }
Say, E = {t1, t2, t3, !t1, !t2,!t3, t1 && !t3, t2 && !t1, t3&& t2}For e = (t3 && t2), value 1passes test
There exist alternate valuesfor e (i.e. for (t3 && t2)),different than the originalvalues, that still passes allpassing tests
So, e = (t3 && t2) is identifiedas a suspicious expression
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 28 / 36
Introduction
Angelic non-determinism
278 i n t main ( ) {r e a d ( x , y , z ) ;
280 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
282 t3 = ( z >= x )
284 i f (∗ ) max = z ; // a n g e l i ci f ( t1 && ! t3 ) max = x ;
286 i f ( t2 && ! t1 ) max = y ;
288 output ( max ) ;}
290
Say, E = {t1, t2, t3, !t1, !t2,!t3, t1 && !t3, t2 && !t1, t3&& t2}For e = (t3 && t2), value 1passes test
There exist alternate valuesfor e (i.e. for (t3 && t2)),different than the originalvalues, that still passes allpassing tests
So, e = (t3 && t2) is identifiedas a suspicious expression
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 28 / 36
Introduction
Understanding FlexTest
Ideal check
Given a suspicious expression e and a test I, does Test(P[e’/e], I) hold for analternate expression e’?
Requires us to know the repaired expression e’ !
Approximate check
Given a suspicious expression e and a test I, does Test(P[w’/e], I), hold for analternate value w’, w 6= w′, where w is the value for the expression e when P isrun on I?
Do we lose anything?
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 29 / 36
Introduction
Understanding FlexTest
Ideal check
Given a suspicious expression e and a test I, does Test(P[e’/e], I) hold for analternate expression e’?
Requires us to know the repaired expression e’ !
Approximate check
Given a suspicious expression e and a test I, does Test(P[w’/e], I), hold for analternate value w’, w 6= w′, where w is the value for the expression e when P isrun on I?
Do we lose anything?
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 29 / 36
Introduction
Understanding FlexTest
Ideal check
Given a suspicious expression e and a test I, does Test(P[e’/e], I) hold for analternate expression e’?
Requires us to know the repaired expression e’ !
Approximate check
Given a suspicious expression e and a test I, does Test(P[w’/e], I), hold for analternate value w’, w 6= w′, where w is the value for the expression e when P isrun on I?
Do we lose anything?
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 29 / 36
Introduction
Understanding FlexTest
Ideal check
Given a suspicious expression e and a test I, does Test(P[e’/e], I) hold for analternate expression e’?
Requires us to know the repaired expression e’ !
Approximate check
Given a suspicious expression e and a test I, does Test(P[w’/e], I), hold for analternate value w’, w 6= w′, where w is the value for the expression e when P isrun on I?
Do we lose anything?
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 29 / 36
Introduction
Approximate Check
i n t main ( ) {292 r e a d ( x , y , z ) ;
t1 = ( x >= y ) ;294 t2 = ( y >= z ) ;
t3 = ( z >= x ) ;296
i f ( t1 && ! t3 ) max = x ;298 i f ( t2 && ! t1 ) max = y ;
i f ( t3 && t2 ) max = z ; // bug300
output ( max ) ;302 }
A non-zero value of expression (t3 && t2) breaks other passing tests,violating FlexText!
So, we may lose on some suspicious expressions.
But in this case, there is another repair expression, t2 at the lastconditiona, that passes FlexTest
arecall that all expressions at distinct lines are different
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 30 / 36
Introduction
Approximate Check
304 i n t main ( ) {r e a d ( x , y , z ) ;
306 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
308 t3 = ( z >= x ) ;
310 i f ( t1 && ! t3 ) max = x ;i f ( t2 && ! t1 ) max = y ;
312 i f ( t3 && t2 ) max = z ; // bug
314 output ( max ) ;}
316
A non-zero value of expression (t3 && t2) breaks other passing tests,violating FlexText!
So, we may lose on some suspicious expressions.
But in this case, there is another repair expression, t2 at the lastconditiona, that passes FlexTest
arecall that all expressions at distinct lines are different
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 30 / 36
Introduction
Approximate Check
i n t main ( ) {318 r e a d ( x , y , z ) ;
t1 = ( x >= y ) ;320 t2 = ( y >= z ) ;
t3 = ( z >= x ) ;322
i f ( t1 && ! t3 ) max = x ;324 i f ( t2 && ! t1 ) max = y ;
i f ( t3 && t2 ) max = z ; // bug326
output ( max ) ;328 }
A non-zero value of expression (t3 && t2) breaks other passing tests,violating FlexText!
So, we may lose on some suspicious expressions.
But in this case, there is another repair expression, t2 at the lastconditiona, that passes FlexTest
arecall that all expressions at distinct lines are different
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 30 / 36
Introduction
Approximate Check
330 i n t main ( ) {r e a d ( x , y , z ) ;
332 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
334 t3 = ( z >= x ) ;
336 i f ( t1 && ! t3 ) max = x ;i f ( t2 && ! t1 ) max = y ;
338 i f ( t3 && t2 ) max = z ; // bug
340 output ( max ) ;}
342
A non-zero value of expression (t3 && t2) breaks other passing tests,violating FlexText!
So, we may lose on some suspicious expressions.
But in this case, there is another repair expression, t2 at the lastconditiona, that passes FlexTest
arecall that all expressions at distinct lines are different
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 30 / 36
Introduction
How to realize angelic non-determinism using SymbolicExecution: AngelicTest and FlexTest
i n t main ( ) {344 r e a d ( x , y , z ) ;
t1 = ( x >= y ) ;346 t2 = ( y >= z ) ;
t3 = ( z >= x )348
i f ( a = s y m b o l i c ( ) ) // ( t3 && t2 )350 max = z ;
assume ( a != ( t3 && t2 ) )352 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;354
output ( max ) ;356 assume ( max == expected max ) ;
}358
AngelicTest: Run program onconcrete inputs, fresh symbolicvariable for candidate expression,output assumed to be expectedoutput.
FlexTest: Run program on concreteinputs, fresh symbolic variable forcandidates, assume that thesymbolic variable takes a differentvalue than actual expression, outputassumed to be expected output.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 31 / 36
Introduction
How to realize angelic non-determinism using SymbolicExecution: AngelicTest and FlexTest
i n t main ( ) {360 r e a d ( x , y , z ) ;
t1 = ( x >= y ) ;362 t2 = ( y >= z ) ;
t3 = ( z >= x )364
i f ( a = s y m b o l i c ( ) ) // ( t3 && t2 )366 max = z ;
assume ( a != ( t3 && t2 ) )368 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;370
output ( max ) ;372 assume ( max == expected max ) ;
}374
AngelicTest: Run program onconcrete inputs, fresh symbolicvariable for candidate expression,output assumed to be expectedoutput.
FlexTest: Run program on concreteinputs, fresh symbolic variable forcandidates, assume that thesymbolic variable takes a differentvalue than actual expression, outputassumed to be expected output.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 31 / 36
Introduction
Discussion
Only works on 1-fixable programs
Evaluated on JTOPAS, an open-source Java library for parsing arbitrary text,with 10 seeded faults; could identify 4 of them.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 32 / 36
Introduction
What about repair? (SemFix, ICSE 2013)
i n t main ( ) {376 r e a d ( x , y , z ) ;
t1 = ( x >= y ) ;378 t2 = ( y >= z ) ;
t3 = ( z >= x ) ;380
// ( t3 && t2 )382 i f f ( x , y , z , t1 , t2 , t2 )
max = z ;384 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;386
output ( max ) ;388 }
Instead of non-determinism, maintain an uninterpreted function to“capture” the semantics of the correct expression;
Run the program on all inputs to collect “enough” examples for thesemantics of the correct expression;
Synthesize the correct expression according to the semantics.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 33 / 36
Introduction
What about repair? (SemFix, ICSE 2013)
390 i n t main ( ) {r e a d ( x , y , z ) ;
392 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
394 t3 = ( z >= x ) ;
396 // ( t3 && t2 )i f f ( x , y , z , t1 , t2 , t2 )
398 max = z ;i f ( t1 && ! t3 ) max = x ;
400 i f ( t2 && ! t1 ) max = y ;
402 output ( max ) ;}
404
Instead of non-determinism, maintain an uninterpreted function to“capture” the semantics of the correct expression;
Run the program on all inputs to collect “enough” examples for thesemantics of the correct expression;
Synthesize the correct expression according to the semantics.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 33 / 36
Introduction
What about repair? (SemFix, ICSE 2013)
i n t main ( ) {406 r e a d ( x , y , z ) ;
t1 = ( x >= y ) ;408 t2 = ( y >= z ) ;
t3 = ( z >= x ) ;410
// ( t3 && t2 )412 i f f ( x , y , z , t1 , t2 , t2 )
max = z ;414 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;416
output ( max ) ;418 }
Instead of non-determinism, maintain an uninterpreted function to“capture” the semantics of the correct expression;
Run the program on all inputs to collect “enough” examples for thesemantics of the correct expression;
Synthesize the correct expression according to the semantics.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 33 / 36
Introduction
What about repair? (SemFix, ICSE 2013)
420 i n t main ( ) {r e a d ( x , y , z ) ;
422 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
424 t3 = ( z >= x ) ;
426 // ( t3 && t2 )i f f ( x , y , z , t1 , t2 , t2 )
428 max = z ;i f ( t1 && ! t3 ) max = x ;
430 i f ( t2 && ! t1 ) max = y ;
432 output ( max ) ;}
434
Instead of non-determinism, maintain an uninterpreted function to“capture” the semantics of the correct expression;
Run the program on all inputs to collect “enough” examples for thesemantics of the correct expression;
Synthesize the correct expression according to the semantics.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 33 / 36
Introduction
What about repair? (SemFix, ICSE 2013)
i n t main ( ) {436 r e a d ( x , y , z ) ;
t1 = ( x >= y ) ;438 t2 = ( y >= z ) ;
t3 = ( z >= x ) ;440
// ( t3 && t2 )442 i f f ( x , y , z , t1 , t2 , t2 )
max = z ;444 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;446
output ( max ) ;448 }
Instead of non-determinism, maintain an uninterpreted function to“capture” the semantics of the correct expression;
Run the program on all inputs to collect “enough” examples for thesemantics of the correct expression;
Synthesize the correct expression according to the semantics.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 33 / 36
Introduction
SemFix
450 i n t main ( ) {r e a d ( x , y , z ) ;
452 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
454 t3 = ( z >= x ) ;
456 i f ( a = s y m b o l i c ( ) ) max = z ; //bug
assume ( a != ( t3 && t2 ) )458 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;460
output ( max ) ;462 }
Test f(x,y,z,t1,t2,t3)I1 f(8,2,4,1,0,0) = 0I2 f(1,2,4,0,0,1) = 1I3 f(7,5,4,1,1,0) = 0I4 f(2,5,1,0,1,0) = 0
(t3 && !t2) synthesized!
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 34 / 36
Introduction
SemFix
464 i n t main ( ) {r e a d ( x , y , z ) ;
466 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
468 t3 = ( z >= x ) ;
470 i f ( a = s y m b o l i c ( ) ) max = z ; //bug
assume ( a != ( t3 && t2 ) )472 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;474
output ( max ) ;476 }
Test f(x,y,z,t1,t2,t3)I1 f(8,2,4,1,0,0) = 0I2 f(1,2,4,0,0,1) = 1I3 f(7,5,4,1,1,0) = 0I4 f(2,5,1,0,1,0) = 0
(t3 && !t2) synthesized!
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 34 / 36
Introduction
SemFix
478 i n t main ( ) {r e a d ( x , y , z ) ;
480 t1 = ( x >= y ) ;t2 = ( y >= z ) ;
482 t3 = ( z >= x ) ;
484 i f ( a = s y m b o l i c ( ) ) max = z ; //bug
assume ( a != ( t3 && t2 ) )486 i f ( t1 && ! t3 ) max = x ;
i f ( t2 && ! t1 ) max = y ;488
output ( max ) ;490 }
Test f(x,y,z,t1,t2,t3)I1 f(8,2,4,1,0,0) = 0I2 f(1,2,4,0,0,1) = 1I3 f(7,5,4,1,1,0) = 0I4 f(2,5,1,0,1,0) = 0
(t3 && !t2) synthesized!
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 34 / 36
Introduction
Angelix [ICSE ’16]
What about multi-line repairs?
Challenge: The value from one repaired expression may be required to feedinto the repair of another expression:
The dependencies of which repairs feed into which, can be represented as aforest — angelic forest;The synthesis of repair expressions is done on this angelic forest;The angelic forest is independent of the size of the program, and only dependson the domain of candidate repair expressions.
The synthesis procedure can be seen as synthesizing higher-order functions(symbolic execution and the angelic forest, however, allow you to skip thiscomplexity and allows synthesis with values only)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 35 / 36
Introduction
Angelix [ICSE ’16]
What about multi-line repairs?
Challenge: The value from one repaired expression may be required to feedinto the repair of another expression:
The dependencies of which repairs feed into which, can be represented as aforest — angelic forest;The synthesis of repair expressions is done on this angelic forest;The angelic forest is independent of the size of the program, and only dependson the domain of candidate repair expressions.
The synthesis procedure can be seen as synthesizing higher-order functions(symbolic execution and the angelic forest, however, allow you to skip thiscomplexity and allows synthesis with values only)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 35 / 36
Introduction
Angelix [ICSE ’16]
What about multi-line repairs?
Challenge: The value from one repaired expression may be required to feedinto the repair of another expression:
The dependencies of which repairs feed into which, can be represented as aforest — angelic forest;
The synthesis of repair expressions is done on this angelic forest;The angelic forest is independent of the size of the program, and only dependson the domain of candidate repair expressions.
The synthesis procedure can be seen as synthesizing higher-order functions(symbolic execution and the angelic forest, however, allow you to skip thiscomplexity and allows synthesis with values only)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 35 / 36
Introduction
Angelix [ICSE ’16]
What about multi-line repairs?
Challenge: The value from one repaired expression may be required to feedinto the repair of another expression:
The dependencies of which repairs feed into which, can be represented as aforest — angelic forest;The synthesis of repair expressions is done on this angelic forest;
The angelic forest is independent of the size of the program, and only dependson the domain of candidate repair expressions.
The synthesis procedure can be seen as synthesizing higher-order functions(symbolic execution and the angelic forest, however, allow you to skip thiscomplexity and allows synthesis with values only)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 35 / 36
Introduction
Angelix [ICSE ’16]
What about multi-line repairs?
Challenge: The value from one repaired expression may be required to feedinto the repair of another expression:
The dependencies of which repairs feed into which, can be represented as aforest — angelic forest;The synthesis of repair expressions is done on this angelic forest;The angelic forest is independent of the size of the program, and only dependson the domain of candidate repair expressions.
The synthesis procedure can be seen as synthesizing higher-order functions(symbolic execution and the angelic forest, however, allow you to skip thiscomplexity and allows synthesis with values only)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 35 / 36
Introduction
Angelix [ICSE ’16]
What about multi-line repairs?
Challenge: The value from one repaired expression may be required to feedinto the repair of another expression:
The dependencies of which repairs feed into which, can be represented as aforest — angelic forest;The synthesis of repair expressions is done on this angelic forest;The angelic forest is independent of the size of the program, and only dependson the domain of candidate repair expressions.
The synthesis procedure can be seen as synthesizing higher-order functions(symbolic execution and the angelic forest, however, allow you to skip thiscomplexity and allows synthesis with values only)
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 35 / 36
Introduction
Concluding Remarks
In terms of industrial adoption, symbolic execution is perhaps one of themost successful outcomes from PL and SE research.
Engines like SAGE and KLEE are quite mature, and are being used routinelyin industry and academia.
Acknowledgements: Some of the slides are from the ISSTA 2019 talk of mystudent, Awanish Pandey.
Subhajit Roy (IIT Kanpur) Symbolic Execution November 8, 2019 36 / 36