#1 patches as better bug reports wes weimer university of virginia [email protected] patches as...

29
#1 Patches As Better Bug R Patches As Better Bug R eports eports Wes Weimer Wes Weimer University of Virginia University of Virginia [email protected] [email protected] Patchmaker, patchmaker Make me a patch! Find a fix for The bugs that you catch! -- Developer on the Roof

Upload: mervin-baker

Post on 14-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

#1

Patches As Better Bug ReportsPatches As Better Bug Reports

Wes WeimerWes WeimerUniversity of VirginiaUniversity of [email protected]@virginia.edu

Patchmaker, patchmakerMake me a patch!

Find a fix forThe bugs that you catch!

-- Developer on the Roof

#2

Motivation• Bad: Software ships with known bugs• Bad: Average cost to field-fix a bug =

$10k• Good: Analysis Tools can find bugs before

you ship (and without the cost of testing)– Splint, SLAM, BLAST, ESC, MC, JPF, Bandera,

FindBugs, WN, MOPS, BANE, MAGIC, PREfix, Saturn, …

• Bad: Currently, only 40% of tool-reported bugs get fixed!

#3

Our Goal• Make tool-generated bug reports more

likely to be addressed by a developer• Current reports: counterexample

backtrace• Proposed reports: also explanatory patch• How to get there?

– Finite State Machines– Fuzzy String Matching– Plus path predicates and other techniques …

• Does It Work? Preliminary Results

#4

This Talk

•Motivation•Finding Bugs, Specifications•Counterexamples•Explanatory Patches•Patch Generation Algorithm•Experimental Results

#5

Finding Bugs: Specifications• To say that a program is wrong you need a

formal specification of correct behavior– Usually a partial correctness policy– “I use locks correctly” not “I am a database”

• Policy = Finite State Machine– One FSM per object– Edges = Program Events– Must end in an accepting state

1 2

s = socket(…)

3bind(s, …)

listen(s,…)

read(t, …)write(t, …)

45t = accept(s,…)

close(t)

#6

Finding Bugs: Counterexamples

• Bug Finding Problem: Given a program and a policy, return an execution path along which the program violates the policy1: Socket s = socket(AF_UNIX, …);

2: bind(s, …); 3: if ( … ) 4: listen(s, 0); 5: else6: do_nothing(); // oops! 7: Socket t = accept(s, …); 8: write(t,”franz kafka”, 12);9: close(t);

Bug Report: On path

1-2-3-6-7-8-9 we forget to call listen().

#7

OK, Why Was That A Bug?

• On path 1-2-3-6-7-8-9 the events are:– socket, bind, accept, write, close

• That string is not accepted by our policy, so that path has a bug

1 2

s = socket(…)

3bind(s, …)

listen(s,…)

read(t, …)write(t, …)

45t = accept(s,…)

close(t)

#8

Bug-Finding In Real Life

• At MS this is done before most CVS check-ins– e.g., cannot commit code if PREfast finds a

bug in it with respect to a project-specific policy

• Open Source projects use similar tools– e.g., splint, FindBugs, Valgrind, PMD, …

• Unfortunately, only about 40% of tool-reported bugs get fixed

#9

Unfixed Bugs• Bug reports can be hard to understand

– Dev not used to analyses, does not understand policy, did not write code, etc.

“Dealing with an error is often an onerous task, even with a detailed failing run in hand.” [G04]

“There is significant room for improving users’ experiences . . . An error trace can be very lengthy and only indicates the symptom . . . users may have to spend considerable time inspecting and error trace to understand the cause.” [BNR03]

“Even a detailed trace of how a system violates a specification may not provide enough information to easily understand (much less remedy) the problem.” [GV03]

#10

Explanatory Patches

• In addition to the bug report, also include an explanatory patch that would fix that bug (without making it worse)

• Given the code, the policy, and the violating path, find a valid sequence of events for that path

• Using path predicates and dataflow we can convert a valid sequence of events for one path into a patch for the whole program

#11

Patch Example/* HSQLDB’s jdbcConnection.executeHTTP() */2708: URLConnection c = url.openConnection();2709: c.setDoOutput(true);2710: OutputStream os = c.getOutputStream();2711:2712: os.write(p.getBytes(ENCODING));2713: os.close();2714: c.connect();/* explanatory patch */ c.setDoOutput(true); OutputStream os = c.getOutputStream();+ try { os.write(p.getBytes(ENCODING));- os.close();+ } finally { + os.close();+ }

Path =…

27082709271027112712

Exception

3000

Buggy EventsgetOutputStrea

mwrite

Valid EventsgetOutputStrea

mwriteclose

#12

Algorithm Intuition• Given a policy DFA P and violating path v• Find a string c such that c 2 L(P) and c is

“closest to” v • Then c is our candidate patch

– “The nearest thing that would have worked”

• Closest? Use an edit distance metric M– Gives a “cost” for insertions and deletions– e.g., M(“vermin”, “ermine”) = “del v, ins e”– Supplied by user (assume costs ins=1,

del=2)

#13

Patch Generation Problem• Given:

– non-empty policy DFA P = <, S, s0, , F>– violating path v 2 *– edit distance metric M : * £ * ! N

• edit distance insert cost Mi : ! N • edit distance delete cost Md : ! N

• Produce:– candidate patch c 2 L(P)– such that M(v,c) is a global minimum

• : 9c’2L(P). M(v,c’) < M(v,c)

– (and explain how to get from v to c)

#14

Patch Generation Result• We can compute such a c in

polynomial time.– Note that c does not introduce any new

bugs with respect to P.

• Intuition:– Construct NFA P1 that accepts v if 9c 2 L(P)

such that M(v,c) · 1.

– If v 2 L(P1) then we are done (return c).

– Otherwise construct P2, try again, etc.

– Does this terminate?

#15

We Can Always Find c

• Will find c in O(|S|) iterations.• Why? 9c 2 L(P) with M(c,v) in O(|S|)

– Proof: P is non-empty, so 9x2L(P) with |x| in O(|S|). Let D be the greatest deletion cost Md in M. Let I be the greatest insertion cost Mi in M. Then:

– M(c,v) · D|v| + I|x| = (D + I)£O(|S|) = O(|S|)• “Delete every char in c and insert every char

in x”

#16

Construction Plan

• How do we make Pk?– Intuition: “layer cake” or “product construction”

• Given P = <, S, s0, , F>

• Let Pk = <, S, s0, , F> with S = S £ {0, …, k}

• Reaching <s, j> on input x to Pk means that there is a string y with M(x, y) · j such that P reaches s on input y.

• “Make k copies of P, each of which is one more edit distance unit away from P”

#17

Pk Construction Example

• k = 2

• Md = 2

• Mi = 1

• L(P) = “xyz”

A,0 B,0x

C,0y

D,0z

A,1 B,1x

C,1y

D,1z

ε (ins x) ε (ins y) ε (ins z)

A,2 B,2x C,2y

D,2z

ε (ins x) ε (ins y) ε (ins z)

x (del x)y (del y)z (del z)

x (del x)y (del y)z (del z)

A Bx

Cy

DzP

P2

#18

Using Pk

• P2 accepts “xz” using x, (ins y), z

• P2 accepts “zxyz” using (del z), x, y, z

• P2 does not accept “xx”. M(“xx”,“xyz”)=4

• Checking if Pk accepts v takes O(|S||v|)

A,0 B,0x

C,0y

D,0z

A,1 B,1x

C,1y

D,1z

ε (ins x) ε (ins y) ε (ins z)

A,2 B,2x C,2y

D,2z

ε (ins x) ε (ins y) ε (ins z)

x (del x)y (del y)z (del z)

x (del x)y (del y)z (del z)P2

#19

Correctness and Costs• If v 2 L(Pk) then 9c such that M(c,v) · k

and c 2 L(P)– “If our constructed NFA accepts the violation

v then we can find a patch c.”

• How big? Pk has edge size O(|| £ |S|)– Deletion edges are the big contributors.

– Can rapidly check to see if Pk accepts v

– |v| is small: pumping lemma, “Path Slicing” (e.g., in [JM05] the largest post-slicing is ~40)

– In practice |S| · 10 (largest used is ~30)

#20

Multiple Paths• So far we have fixed only the violating

path• What about other overlapping paths?

– We cannot introduce new violations of P

• Find the path predicate p for v– “p is true , we are executing v”

• Guard all changes with “if (p)” • Deletion locations are exact• Insertions have multiple possible locations

– Consider them all, choose smallest patch

#21

Other Details

• When inserting an event we may have to suggest function arguments– For typestate policies (e.g., locks, sockets) the

typestate argument is clear.– Other arguments are left as /* FIXME */.

• Exceptions are folded into path predicates– We track where the exception was raised in v– To insert Y guarded by “Exc raised at X”:

try { X; } finally (Exc e) { Y; raise e; }

#22

Experimental Setup

• Use 2 tools to find 76 bugs in 7 programs• For each program, choose half of the bugs

at random and generate patches for them• For each program, submit all bugs at once

– Each report is either normal (counterexample)– Or extended (also explanatory patch)

• Measure how many are addressed in 2 weeks– Addressed = “dev says so” or “manual CVS

inspection”

#23

Bug Reports Addressed

• 43% of reports addressed in 2 weeks• Null Hypothesis: patches did not matter; each

bug was fixed with probability 43%• For this experiment (2 = 4.378), p = 0.0365

ProgramProgram LOCLOC BugsBugs NormaNormall

PatchPatch

hsqldb 65k 20 4 5

ssl-explorer 102k 10 0 0

mckoi-sql 116k 6 0 0

openwfe 128k 4 0 2

jboss 145k 26 1 13

jasper reports 152k 4 0 2

azureus 232k 6 3 3

Total 940k 76 8 25

#24

Conclusions• Can use explanatory patches to make

bug reports more likely to be addressed– Use fuzzy string matching and edit distances to

find something “close” to the violating path in the language of the specification

– Produce insert/delete annotations– Insert/delete events guarded by path predicates– Make a patch; reports are addressed more often

• Summary: given a buggy path, invent non-buggy code most similar to the code on that path and include it as an explanatory patch– Like “peephole optimization” for bug-finding

#25

Questions?

• I encourage difficult questions.

#26

Statistics, Experiments• One null hypothesis for this experiment is that

candidate patches did not matter at all and that each bug was fixed with probability 43%.

• In statistical hypothesis testing a p-value is the probability, assuming that the null hypothesis is true, of getting a result less favorable to the null hypothesis than the observered value. A standard cutoff p < 0.05 is used to judge statistically significant values.

• For this experiment (2 = 4.378), p = 0.0365 • Did not measure how “close” the candidate was to the

actual fix• Does not support the claim that reports with patches

take less time/effort to fix• Might other feedback have helped? Say, taking 10 lines

from the “minimal cause” part of the backtrace? Here the average backtrace length was already 6 (max 34).

#27

Construction Work

• Recall P = <, S, s0, , F>

• Let Pk = <, S, s0, , F>

• S = S £ {0, 1, …, k }

• S0 = <s0, 0>

• F = { <s, j> | s 2 F Æ 0 · j · k } • = ?

– Reaching <s, j> on input x to Pk means that there is a string y with M(x, y) · j such that P reaches s on input y.

#28

Constructed Transitions

• N = { <<s,j>,x,<t,j>> | <s,x,t> 2 Æ 0 · j · k } – “stay within a layer and advance normally”

• D = { <<s,j>,x,<s,j+d>> |

s 2 S Æ x 2 Æ 0 · j · k-Md(x) }– “delete (consume) x, stay where you are in P, but

move down Md layers”

• I = { <<s,j>,,<t,j+i>> |

9x. <s,x,t> 2 delta Æ 0 · j · k-Mi(x) }– “insert (make up) x, transition on x in P, but move

down Mi layers”

• = N [ D [ I

#29

Actual Patch*** jasperreports-1.2.0-orig/src/net/sf/jasperreports/engine/util/JRStringUtil.java--- jasperreports-1.2.0/src/net/sf/jasperreports/engine/util/JRStringUtil.java****************** 118,126 **** */ public static String htmlEncode(String text) {! int length = text.length();! if (text != null && length > 0) { StringBuffer ret = new StringBuffer(length * 12 / 10);

boolean isEncodeSpace = true;--- 121,132 ---- */ public static String htmlEncode(String text) {! if (text != null) {+ int length = text.length();+ if (length > 0) { StringBuffer ret = new StringBuffer(length * 12 / 10); boolean isEncodeSpace = true;****************** 214,219 ****--- 220,226 ---- return text; }+ }