hydiﬀ: hybrid differential software analysis · shallow bugs, but bad in ﬁnding deep program...

HyDiff Hybrid Differential Software Analysis

1yannicnollerhu-berlinde International Conference on Software Engineering (ICSE) 2020

Yannic Noller Corina S Pasareanu Marcel Boumlhme

Youcheng Sun Hoang Lam Nguyen Lars Grunske

Differential Analysis

yannicnollerhu-berlinde 2International Conference on Software Engineering (ICSE) 2020

input1

program P

input2

program P

=behavior1

behavior2

yinput

program P

program Prsquo

=behavior1

behavior2

Regression AnalysisSide-Channel Analysis

Robustness Analysis of Neural Networks

Problem Solution SummaryEvaluation

Inputprogram versions

seed input files

change-annotated program

Fuzzing Symbolic Execution

import

ICFGinstrumentation

assessment trie extension assessment

constraint solving input generation

exploration

mutate inputs

import

fuzzer outputqueue

Output

symbc outputqueue

input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

hellip hellip hellip hellip hellip hellip hellip

set of divergence revealing test inputs

good in finding shallow bugs but bad in finding deep program

input reasoning ability but path explosion and

constraint solving

seed input files

24 differential program analysis 21

Table 1 Change-Annotations by Shadow Symbolic Execution [141]

Change Type Example

Update assignment x = x + change(E1 E2)

Update condition if(change(E1 E2))

Add extra assignment x = change(x E)

Remove assignment x = change(E x)

Add conditional if(change(false C))

Remove conditional if(change(C false))

Remove code if(change(true false))

Add code if(change(false true))

existing assignment x = x + change(E1 E2) the variable x holds two expressions x +

E2 for the new version and x + E1 for the old versionSSE performs dynamic symbolic-execution on such a unified program version which is

implemented in two phases (1) the concolic phase and (2) the bounded symbolic execution(BSE) phase

In the first phase SSE simply follows the concrete execution of test inputs from an ex-isting test suite while it checks for divergences along the control-flow of the two versionsThis exploration is driven by the idea of four-way forking In traditional symbolic executionevery branching condition introduces two forks to explore the true and false branchesShadow symbolic execution instead introduces four forks to investigate all four combina-tions of true and false branches for both program versions As long as there is no concretedivergence SSE follows the so-called sameTrue and sameFalse branch which denotes thatboth concrete executions take the same branches Additionally SSE checks the satisfiabilityof the path constraints for the other two branching options where both versions take dif-ference branches These branches are called diffTrue and diffFalse paths For every feasiblediff path SSE generates a concrete input and stores the divergence point for later explo-ration by the second phase As long there is no concrete divergence SSE continues untilthe end of the program

When SSE hits the mentioned addition or a removal of straightline code blocks itimmediately stores a divergence point This conservative handling leads to an over-approximation of the diff paths because the added deleted code may not necessarilylead to an actual divergence

The second phase performs bounded symbolic execution (BSE) only on the new versionfrom the stored divergence points to further investigate the divergences

At the end Palikareva et al [141] perform some post-processing of the generated inputsto determine whether they expose some observable differences eg by comparing theoutputs and the exit codes Palikareva et al [141] implemented their approach on top ofthe KLEE symbolic execution engine [33]

Limitations of Shadow Symbolic Execution Shadow symbolic execution as introducedby Palikareva et al [141] is driven by concrete inputs from an existing test suite While thisexploration strategy tries to focus on constraining the search space it might miss importantdivergences as it strongly depends on the quality of these initial test input In particular SSEmight miss deeper divergences in the BSE phase because of limiting prefixes in the pathconstraints Since BSE is started from the identified divergence points it inherits the pathconstraint prefix from the concrete input that has been followed to find this divergence Ingeneral when there are several paths from the beginning of the program to this divergence

[ March 29 2020 at 1153 ndash classicthesis version 01 ]

change-annotations by Palikareva et al [2]

input = change(input1 input2)

HyDiffrsquos Inputseed inputs

program under test

two different change types(1) inside the program code (2) in the input

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Greybox Fuzzing (DF)built upon AFL [1] (genetic algorithm)

mutant selection driven by differential heuristics output difference decision history difference cost difference patch distance

additionally guided by branch coverage

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

Differential Symbolic Execution (DSE)built upon Symbolic PathFinder (SPF) [3]

central data structure trie

node selection driven by differential heuristics decision history difference cost difference patch distance

Output input +odiff +ddiff +crash +cdiff +patch-dist +covid0001 X X Xid0002 X Xid0003 X X

HyDiffrsquos Outputset of generated inputs

classified by divergence output difference (+odiff) control-flow (+ddiff) crashing behavior (+crash) execution cost (+cdiff)

additionally patch distance (+patch-dist) branch coverage (+cov)

ExperimentsRegression Analysis

Side-Channel Analysis

HyDiff classifies all subjects correctly

significantly more output and decision differences

HyDiff shows good trade-off between DSE and DF

no significant amplification of the exploration

stress test for HyDiff

HyDiff significantly more output differences

9yannicnollerhu-berlinde

yannicnollerhydiff

International Conference on Software Engineering (ICSE) 2020

References

[1] Website American Fuzzy Lop (AFL) httplcamtufcoredumpcxafl

[2] Hristina Palikareva Tomasz Kuchta and Cristian Cadar 2016 Shadow of a Doubt Testing for Divergences between Software Versions In 2016 IEEEACM 38th International Conference on Software Engineering (ICSE) 1181ndash1192 https doiorg10114528847812884845

[3] Corina S Păsăreanu Willem Visser David Bushnell Jaco Geldenhuys Peter Mehlitz and Neha Rungta 2013 Symbolic PathFinder integrating symbolic execution with model checking for Java bytecode analysis Automated Software Engineering 20 3 (2013) 391ndash425 httpsdoiorg101007s10515-013-0122-2

Differential Analysis

input1

program P

input2

program P

=behavior1

behavior2

yinput

program P

program Prsquo

=behavior1

behavior2

Regression AnalysisSide-Channel Analysis

seed input files

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

Output

symbc outputqueue

constraint solving

seed input files

Change Type Example

program under test

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

yannicnollerhydiff

References

seed input files

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

Output

symbc outputqueue

constraint solving

seed input files

Change Type Example

program under test

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

yannicnollerhydiff

References

seed input files

Change Type Example

program under test

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

yannicnollerhydiff

References

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

yannicnollerhydiff

References

import

ICFGinstrumentation

exploration

mutate inputs

import

fuzzer outputqueue

symbc outputqueue

yannicnollerhydiff

References

yannicnollerhydiff

References

yannicnollerhydiff

References

yannicnollerhydiff

References

hydiﬀ: hybrid differential software analysis · shallow bugs, but bad in ﬁnding deep program...

Documents

bugs, bugs, bugs which are good? which are...

the agile security manifesto - synopsys · likewise, the...

some nysius bugs of concern to u...some nysius bugs of...

bugs world 3 audio cd piosenki, historyjki i wierszyki ·...

tpm/ipm weekly r eport - university of maryland · similar...

beating back bed bugs - contra costa county · beating back...

mini 4h bugs · some bugs crawl on the ground, and some...

bugs for bugs update autumn 2017 · suite of beneﬁcial...

blockchain security! · blockchain bugs? • design bugs:...

control and sequence bugs, requirements features...

nature series environmental centers dedicated to nature...

good bugs & bad bugs

ila-mcm: integrating memory consistency models with...

bigeyed bugs, geocoris spp. (insecta: hemiptera: … nysius...

andreas grabner your place in devtops is not about finding...

automatic software repair: a bibliography · ware repair...

things that make you say bed bugs - the supportive … that...

motif ﬁnding - fu-berlin.de

1 win xp/vista/win7+++ win 2000. 2 bugs, bugs and bugs

all about bed bugs · 2019-05-14 · bed bugs feed on human...