statically validating must summaries for incremental compositional dynamic test generation patrice...
TRANSCRIPT
Statically Validating Must Summaries for Incremental Compositional
Dynamic Test Generation
Patrice Godefroid Shuvendu K. Lahiri Cindy Rubio-González
International Static Analysis Symposium – September 2011
Microsoft Research University of Wisconsin – Madison
2
Valid inputConstraintsRecorded
trace
Background
• Systematic Dynamic Test Generation (= DART)
• Used in many toolso EXE, CUTE, SAGE, PEX, KLEE, BitScope, Apollo, etc.
Run program
Symbolically execute program
Negate and solve constraints
New inputs
And the process repeats (possibly forever!)
3
o 200+ machines (since 2008)
• #1 application for SMT solvers today (CPU usage)o 1st whitebox fuzzer for security testing
SAGE @ Microsoft
o 1 billion+ constraints
o 100s of apps, 100s of security bugs Example: Win7 file fuzzing Found ~1/3 of all fuzzing bugs
o Millions of dollars saved for Microsoft + time/energy for the world
Compositional Test Generation
Compositional Dynamic Test Generation• Compute summaries that can be reused later• Avoid retesting• Can provide the same path coverage
exponentially faster!
4
Systematically executing all feasible paths does not scale
Example of Function Summary
5
1 int is_positive(int x) {2 if (x > 0) return 1;3 return 0;4 }
Where ret denotes the value returned by the function is_positive
Function Summaries
6
• Function summary for a function fo Logic formula over constraintso Derived by successive iterations and defined as a
disjunction of formulas
𝜑𝑤𝑓=𝑝𝑟𝑒𝑤𝑓 ⋀𝑝𝑜𝑠𝑡𝑤𝑓
Conjunction of constraints on the
inputs of f
Conjunction of constraints on the
outputs of f
o Can be computed automatically from the path constraint for the intraprocedural path
Must Summaries
7
• Symbolic execution of large programs impreciseo Complex program statementso Calls to operating-system and library functions
• Concrete values simplified constraintso Under-approximate path constraintso Summaries become must summaries
1 int g(int x, int y) {2 if ((x > 0) && (hash(y) > 10))
3 return 1;4 return 0;5 }
𝜑𝑔=(𝑥>0∧ 𝑦=45∧𝑟𝑒𝑡=1 )Under-approximate with smaller
precondition
Assume hash is a complex or unknown function
Assume if g is invoked with y = 45, then hash(45) = 987
Must Summaries
8
• Defined as quadruple ⟨lp, P, lq, Q ⟩ where:
Prog
Ip
lq
P summary precondition holding at lp
Q summary postcondition holding at lq
Some Facts About Summaries
• Time to be produced: weeks/months
9
• Number of summaries: millions
• Number of instructions executed between lp and lq: can be hundreds of thousands
Incremental Compositional Test Generation
10
Have to start from scratch if there is a small code change
Incremental compositional test generation • As in smart/selective regression testing• Reuse summaries still valid in new program• Recompute invalid summaries
Must Summary Checking
11
• Given a valid must summary for a program and a new version of the program, is the summary still valid for the new version?
• Intraprocedural summarieso locations lp and lq are in a same function fo function f does not return between lp to lq when the
summary is generated
Some proposals
• Naïveo For each summary, record executed instructions
Too expensive, ~100K of instructions executed Runtime overhead
12
• Our proposalo Verify statically what summaries are valid in
order to reuse them Less precise than recomputing summaries from
scratch, but cheaper
Algorithms
1. Static Change Impact Analysis
13
2. Predicate-Sensitive Change Impact Analysis
3. Must Summary Validity Checking Analysis
Phase 1: Static Change Impact Analysis
• Impact analysis of code changes in the control-flow and call graphs of the program
14Old program New program
Ip
lq
Ip
lq
Modified Instructions and Functions
• Instruction i of a program Prog is modified if:o i is changed or deleted in Prog’ oro Its ordered set of immediate successors has changed
15
• Function f in a program Prog is modified if f:o contains a modified instructiono calls a modified functiono calls an unknown function
Phase 1: Static Change Impact Analysis
16
...... ......
...... ............
......
......
...... ......
............
Construct call graph for the program1
17
...... ......
...... ............
......
......
...... ......
............
U
MMU
M
IM
IM IMIU
IU
IU
IU
IU
IUIU
IM
S
S
S
S
S S
Find modified and unknown functions2 Find indirectly modified and unknown functions3
Phase 1: Static Change Impact Analysis
4 Map summaries, construct control-flow graphs
18
...... ......
...... ............
......
......
...... ......
............
U
MMU
M
IM
IM IMIU
IU
IU
IU
IU
IUIU
IM
S
S
S
S
S S
Find summaries as valid or invalid5
Phase 1: Static Change Impact Analysis
Phase 2: Predicate-Sensitive Change Impact Analysis
19
• Exploit the predicates P and Q in a summary
if(x > 0)
if (y==0)
w = w + 1
w = 0 w = 1
...
Ip
lq
P: x>0 y<10
Q: w = 0
Old program
Invalidated by Phase 1
Phase 2: Predicate-Sensitive Change Impact Analysis
20
...if (x > 0) { if (y == 10) w++; // MODIFIED else w = 0;}else { w = 1; // MODIFIED}...
Old program
void foo() {
return;}
Ip
lq
P: x>0 y<10
Q: w = 0
goto lp;...assume P; modified = false;if (x > 0) { if (y == 10) { modified = true; w++; } else w = 0;}else { modified = true; w = 1; }assert(Q ¬modified);...
Phase 2: Predicate-Sensitive Change Impact Analysis
21Instrumented old program
1
2
4
3
3
void foo() {
return;}
Ip
lq
P: x>0 y<10
Q: w = 0
Phase 2: Predicate-Sensitive Change Impact Analysis
• Check assertion in instrumented code does not fail for all possible inputs
22
• Verification-condition based program verifiero Create logic formula from program with assertionso Check formula validity using theorem provero If valid, the assertion does not fail in any execution
Phase 3: Must Summary Validity Checking
23
• Check must summary validity against some code, independently of code changes
if(x < 0)
if (y < 0)
r = 1 r = 0 w = 1
...
Ip
lq
P: x < 0
Q: r 0
Old program
r = 4
New program
Invalidated by Phase 1 and Phase 2
Phase 3: Must Summary Validity Checking
24
...if (x < 0) { if (y < 0) r = 1; else { r = 4; // r = 0 in old code }}...
New program
void bar() {
return;}
Ip
lq
P: x < 0
Q: r 0
Phase 3: Must Summary Validity Checking
25
reach_lq = false; goto lp;...assume P;if (x < 0) { if (y < 0) r = 1; else { r = 4; // r = 0 in old code }}assert(Q); reach_lq = true;...assert(reach_lq);
Instrumented new program
1
2
3
4
void bar() {
return;}
Ip
lq
P: x < 0
Q: r 0
Phase 3: Must Summary Validity Checking
• Check that assertions hold in the instrumented program for all possible inputs
26
Result
27
Validated summaries can be reusedo Because of soundness
Invalidated summaries are discarded and need to be recomputed
o New tests are generated to cover their preconditions
Algorithms can be used in isolation or in a pipeline
Experimental Results
28
Implementation Details
29
Map summaries, find modified insts
and funcs (C++)
Old DLL SummariesOld
DLLOld DLL
NewDLLNew
DLLNewDLL
Vulcan
Produced by SAGE
Phase 1Change Impact
Phase 2Predicate Sensitive
Phase 3Validity Checking
Valid/Invalid Summaries
Library to statically analyze Windows binaries
Used in pipeline or isolation
Implementation Details
Translator from X86 to BoogiePL
Procedure (x86)
Vulcan
Summary ⟨lp,P,lq,Q⟩
Sound translation
Instrumented BPL file (Phase 2 or Phase 3)
Boogie/Z3
Benchmarks
31
• Image parsers embedded in Windows o ANI, GIF and JPEG
• Ran SAGE to generate summaries (small sample)o 286 for ANI, 288 for GIF and 517 for JPEG
• Identified the DLLs involvedo 3 for ANI, 4 for GIF and 8 for JPEG
• Compared old version against a randomly picked newer versiono Delta ~1 to 3 years
Difference Between Program Versions
32
ANI GIF JPEG0
5000
10000
15000
20000
25000
6978
13897
20357
Number of Functions per Benchmark
Modified functions: 3% - 10% Indirectly modified functions: 30% - 45%
Unknown functions: 27% - 37% Indirectly unknown functions: 60% - 74%
Applying Phases in Isolation
33
Phase 1 Phase 2 Phase 30
50
100
150
200
250
300
167
244
86
ANI (286 summaries)
Phase 1 Phase 2 Phase 30
50
100
150
200
250
300
198
264
90
GIF (288 summaries)
Phase 1 Phase 2 Phase 30
100
200
300
400
500
600
317
487
173
JPEG (517 summaries)
# Va
lidat
ed S
umm
arie
s
# Va
lidat
ed S
umm
arie
s
# Va
lidat
ed S
umm
arie
s
58% 85% 30% 69% 92% 31%
61% 94% 33%
Total Validated: 256/286 (90%)
Total Validated: 274/288 (95%)
Total Validated: 501/517 (97%)
Phase 1: Change ImpactPhase 2: Predicate SensitivePhase 3: Validity Checking
Applying Phases in Pipeline FashionPhase 1 → Phase 2 → Phase 3
34
Phase 1 Phase 2 Phase 30
20406080
100120140160180
167
77
12
ANI (286 summaries)
Phase 1 Phase 2 Phase 30
50
100
150
200
250
198
73
3
GIF (288 summaries)
Phase 1 Phase 2 Phase 30
50
100
150
200
250
300
350
317
179
5
JPEG (517 summaries)
# Va
lidat
ed S
umm
arie
s
# Va
lidat
ed S
umm
arie
s
# Va
lidat
ed S
umm
arie
s
58% 27% 4%
Total Validated: 256/286 (90%)
69% 25% 1%
61% 35% 1%
Total Validated: 274/288 (95%)
Total Validated: 501/517 (97%)
Phase 1: Change ImpactPhase 2: Predicate SensitivePhase 3: Validity Checking
Phase 1 Phase 2 Phase 30
5
10
15
20
25
30
35
40
12
31
37
JPEG
Phase 1 Phase 2 Phase 30
5
10
15
20
25
30
35
40
8
23
35
GIF
Phase 1 Phase 2 Phase 305
1015202530354045
5
3742
ANI
Running Time (Isolation)
35
# M
inut
es
# M
inut
es
# M
inut
es
Phase 1: Change ImpactPhase 2: Predicate SensitivePhase 3: Validity Checking
Running Time Phase 1 → Phase 2 → Phase 3
36
# M
inut
es
43 min 28min 41min
Preliminary results show that statically validating must summaries is up to 20 times faster than recomputing them!
ANI GIF JPEG0
5
10
15
20
25
30
35
40
45
50
Running Time (Pipeline)
Phase 3Phase 2Phase 1Mapping, etc.
Phase 1: Change ImpactPhase 2: Predicate SensitivePhase 3: Validity Checking
Summary• Formulated the problem of statically validating must
summaries
37
• Demonstrated the effectiveness of static must summary checkingo Validated hundreds of must summaries in minutes
• Described three approaches for validating must summaries
• Presented a preliminary evaluation on three large Windows image parsers
Questions?
38
Map summaries, find modified insts
and funcs (C++)
Old DLL SummariesOld
DLLOld DLL
NewDLLNew
DLLNewDLL
Vulcan
Phase 1Change Impact
Phase 2Predicate Sensitive
Phase 3Validity Checking
Valid/Invalid Summaries