cs4723 software engineering lecture 10 debugging and fault localization

55
CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Upload: lillian-townsend

Post on 18-Dec-2015

235 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

CS4723Software

Engineering

Lecture 10Debugging and Fault

Localization

Page 2: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

2

Debugging

We do when testing find a bug

Basic Process Reproduce the bug

Locate the fault

Fix

Page 3: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

3

Debugging

Sometimes the software is too large

Before we can do the fix

Narrow down the relevant input Delta Debugging

Narrow down the relevant code Statistical debugging

Dynamic slicing

Page 4: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

4

Debugging

The inputs can be very complex… Quite common in real world (compiler, office,

browser, database, OS, …)

Important to locate just relevant inputs Shorten the execution for debugging Filter out the noise Easier to identify the root cause of the bug

Page 5: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

5

Consider Mozilla Firefox

Taking html pages as inputs A large number of bugs are related to

loading certain html pages Corner cases in html syntax

Incompatibility between browsers

Corner cases in Javascripts, css, …

Error handling for incorrect html, Javascript, css, …

Page 6: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

6

How do we go from this<SELECT NAME="op sys" MULTIPLE SIZE=7><OPTION VALUE="All">All<OPTION VALUE="Windows 3.1">Windows 3.1<OPTION VALUE="Windows 95">Windows 95<OPTIONVALUE="Windows 98">Windows 98<OPTION VALUE="Windows ME">Windows ME<OPTION VALUE="Windows 2000">Windows2000<OPTION VALUE="Windows NT">Windows NT<OPTION VALUE="Mac System 7">Mac System 7<OPTION VALUE="Mac System7.5">Mac System 7.5<OPTION VALUE="Mac System 7.6.1">Mac System 7.6.1<OPTION VALUE="Mac System 8.0">Mac System8.0<OPTION VALUE="Mac System 8.5">Mac System 8.5<OPTION VALUE="Mac System 8.6">Mac System 8.6<OPTION VALUE="MacSystem 9.x">Mac System 9.x<OPTION VALUE="MacOS X">MacOS X<OPTION VALUE="Linux">Linux<OPTIONVALUE="BSDI">BSDI<OPTION VALUE="FreeBSD">FreeBSD<OPTION VALUE="NetBSD">NetBSD<OPTIONVALUE="OpenBSD">OpenBSD<OPTION VALUE="AIX">AIX<OPTION VALUE="BeOS">BeOS<OPTION VALUE="HP-UX">HPUX<OPTION VALUE="IRIX">IRIX<OPTION VALUE="Neutrino">Neutrino<OPTION VALUE="OpenVMS">OpenVMS<OPTIONVALUE="OS/2">OS/2<OPTION VALUE="OSF/1">OSF/1<OPTION VALUE="Solaris">Solaris<OPTIONVALUE="SunOS">SunOS<OPTION VALUE="other">other</SELECT></td><td align=left valign=top><SELECT NAME="priority" MULTIPLE SIZE=7><OPTION VALUE="--">--<OPTION VALUE="P1">P1<OPTION VALUE="P2">P2<OPTION VALUE="P3">P3<OPTIONVALUE="P4">P4<OPTION VALUE="P5">P5</SELECT></td><td align=left valign=top><SELECT NAME="bug severity" MULTIPLE SIZE=7><OPTION VALUE="blocker">blocker<OPTION VALUE="critical">critical<OPTION VALUE="major">major<OPTIONVALUE="normal">normal<OPTION VALUE="minor">minor<OPTION VALUE="trivial">trivial<OPTIONVALUE="enhancement">enhancement<

Page 7: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

7

To this…

<SELECT NAME="priority" MULTIPLE SIZE=7>

Page 8: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

8

Motivation

Turning bug reports with real web pages to minimized test cases

The minimized test case should still be able to reveal the bug

Benefit of simplification Easy to communicate

Remove duplicates

Easy debugging Involve less potentially buggy code Shorter execution time

Page 9: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

9

Delta Debugging

The problem definition A program exhibit an error for an input

The input is a set of elements

E.g., a sequence of API calls, a text file, a serialized object, …

Problem: Find a smaller subset of the input that still cause the

failure

Page 10: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

10

A generic algorithm

How do people handle this problem?

Binary search Cut the input to halves

Try to reproduce the bug

Iterate

Page 11: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

11

Delta Debugging Version 1

The set of elements in the bug-revealing input is I

Assumptions Each subset of I is a valid input:

Each Subset of I -> success / fail

A single input element E causes the failure

E will cause the failure in any cases (combined with any other elements) (Monotonic)

Page 12: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

12

Solution is simple

Go with the binary search process

Throw away half of the input elements, if the rest input elements still cause the failure

Page 13: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

13

Solution is simple

Go with the binary search process

Throw away half of the input elements, if the rest input elements still cause the failure

A single element: we are done!

Page 14: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

14

Example

Page 15: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

15

Delta Debugging Version 1

This is just binary search: easy to automate

The assumptions do not always hold

Let’s look at the assumptions:

(I1 U I2) =

-> I1 = and I2 =

or I1 = and I2 =

It is interesting to see if this is not the case

Page 16: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

16

Case I: multiple failing branches

What happened if I1 = and I2 = ?

A subset of I1 fails and also a subset of I2 fails

We can simply continue to search I1 and I2 And we find two fail-causing elements

They may be due to the same bug or not

Page 17: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

17

Case II: Interference

What happened if I1 = and I2 = ?

This means that a subset of I1 and a subset of I2

cause the failure when they combined

This is called interference

Page 18: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

18

Handling Interference

The cute trick Consider I1 = and I2 =

But I1 U I2 =

An element D1 in I1 and an element D2 in I2 cause the

failure

We do binary search in I2 with I1

Split I2 to P1 and P2, try I1 U P1 and I1 U P2

Continue until you find D2, so that I1 U D2 cause the

failure

Then we do binary search in I1 with D2 until find D1

Return D1 U D2

Page 19: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

19

Example I: Handle interference

Consider 8 input elements, of which 3 and 7 cause the failure when they applied together

Configuration Result1 2 3 4

5 6 7 81 2 3 4 5 61 2 3 4 7 8

1 2 3 4 7

1 2 7 3 4 7 3 7

Interference!

Page 20: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

20

Example II: Handle multiple interference

Consider 8 input elements, of which 3, 5 and 7 cause the failure when they applied together

Configuration Result1 2 3 4

5 6 7 81 2 3 4 5 61 2 3 4 7 8

1 2 3 4 5 6 7

1 2 3 4 5 7 1 2 5 7 3 4 5 7

Interference!

Second Interference! What to do?

3 5 7

Go on with I1 U P1!

Page 21: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

21

Delta Debugging Version 2

The set of elements in the bug-revealing input is I

New Assumptions Each subset of I is a valid input

A subset of input elements E causes the failure

E will cause the failure in any cases (combined with any other elements)

Page 22: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

22

Delta Debugging Version 2

Algorithm Split I to I1 and I2

Case I: I1 = and I2 =

Try I1

Case I: I1 = and I2 =

Try I2

Case I: I1 = and I2 =

try both I1 and I2

Case II: I1 = and I2 =

Handle interference for I1 and I2

Page 23: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

23

Real example: GNU Compiler

This input program (bug.c)

causes Gcc 2.59.2 to crash

when all optimitization are

enabled

Minimize it to debug gcc

Consider each character

as an element

Page 24: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

24

Real example: GNU Compiler

Our delta debugging process Create the appropriate subset of bug.c

Feed it to gcc

Continue according to whether Gcc crashes77

Page 25: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

25

GCC compiler example

The minimized code:

The test case is 1-minimal No single character can be removed

Even every space is removed

The function name has been changed from mult to a signle t

Gcc is executed for 700+ times

Input reduce to 10% of the initial input

t(double z[],int n){int i,j;for(;;){i=i+j+1;z[i]=z[i]*(z[0]+0);}return z[n];}

Page 26: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

26

Another example: GDB

GDB is the debugger from GNU

It updates from 4.16 to 4.17

The version 4.17 no longer compatible with DDD (a GUI for GNU software development tools)

178, 000 lines of code change from 4.16

How to know which code change(s) cause the failure

Page 27: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

27

Results

After a lot of work (by machine) 178KLOC change grouped to 8700 groups (commits)

Use delta debugging

Work it out in 470 tests

It took 48 hours

Doing this by hand would be a nightmare!

Page 28: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

28

Importance of input elements

It is important to have good input element definition So that subset of input elements are valid for input

The size of input is small

Consider the examples GCC example: we use characters as elements, which

is simple but not so good, if the bug happens after parser, the bug is not monotonic due to syntax errors

GDB example: we group LOC to groups to reduce input size to 5% of the original size. 2 days are acceptable, what about 40 days?

Page 29: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

29

Limitations of Delta debugging

Rely on the assumptions Monotonicity does not always hold

Rely on good input elements, always providing valid inputs will enhance efficiency

Require automatic test oracles Good for regression testing No good for development-time testing

Page 30: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

30

Statistical Debugging

Delta Debugging Narrow down the input to be considered

Statistical Debugging Narrow down the code to be considered

Page 31: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

31

Statistical Debugging

Basic Idea Consider a number of test cases, some of

which pass and some of which fail

If a statement is covered mostly by failed test cases, it is highly likely to be the buggy part of the code

Page 32: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

32

Tarantula A classical tool for statistical debugging

Use the following formulas Color = red + pass/(fail + pass) * (green ) Brightness = max (pass, fail)

Page 33: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

33

Tarantula: Illustration

Page 34: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

34

Context based statistical debugging Not just consider a statement

Runtime Control Flow Graph

Also consider connections Outcomes of branches Connections on a runtime-CFG

Page 35: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

35

Runtime Control Flow Graph1: void replaceFirst (sx, sy) {2: for (int i=0;i<len;i++) {3: if (arr[i]==sx){4: arr[i] = sz;5: //should break;6: }7: if (arr[i]==sy)){8: arr[i] = sz;9: //should break;10: }11: }12:}

pass passFail

Page 36: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

36

Limitations Questions:

If a statement is covered only by passed test cases, can it be the root cause of the bug found?

If a statement is covered only by failed test cases, it must be the root cause of the bug found?

Page 37: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

37

Example

void f(int a, int b){ if (a > 0){ //error: should be >= do something; } if (b < 0){ do something }}

Test Cases:3, 22, 1, 0, -12, 0

Page 38: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

38

Dynamic Slicing Another way to narrow down code to be

considered in debugging Recall static slicing

All code elements that affect or are affected by a certain variable

Generate a large dependency graph for the code

Do reachability analysis

Page 39: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

39

Data Dependencies

Data dependencies are the dependency from the usage of a variable to the definition of the variable

Example:s1: x = 3;s2: if(y > 5){s3: y = y + x; //data depend on x in s1s4: }

Page 40: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

40

Control Dependencies

Control dependencies are the dependency from the branch basic blocks to the predicate

Example:

s1: x = 3;s2: if(y > 5){s3: y = y + x; //control depend on y in s2s4: }

Page 41: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

41

Program slicing for sum = 0 -> sum = 1entry:main

expression: sum=0

expression: i=1

control-point: while i<11

call-site: add

expression:sum=add$0

call-site: add

expression:i=add$1

actual-out:add$0

actual-out:add$1

actual-in:sum$0

actual-in: i$0

actual-in: i$1

entry: add

Formal-in: a Formal-in:b formal-out:add$result

expression: add$result=a+b

???

actual-in: 1

Page 42: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

42

Dynamic Slicing Also describe dependencies among code

elements

If a variable has incorrect value, the bug should be in its backward dynamic slice

Like runtime control flow graph A map from static slicing to the executed

code

Page 43: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Dynamic Slicing Example

1: b=02: a=23: for i= 1 to N do4: if ((i++)%2==1) then5: a = a+1 else6: b = a*2 endif done7: z = a+b8: print(z)

For input N=2,11: b=0 [b=0]

21: a=2

31: for i = 1 to N do [i=1]

41: if ( (i++) %2 == 1) then [i=1]

51: a=a+1 [a=3]

32: for i=1 to N do [i=2]

42: if ( i%2 == 1) then [i=2]

61: b=a*2 [b=6]

71: z=a+b [z=9]

81: print(z) [z=9]

Page 44: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Algorithm I

This algorithm uses a static dependence graph in which all executed nodes are marked dynamically so that during slicing when the graph is traversed, nodes that are not marked are avoided as they cannot be a part of the dynamic slice.

Limited dynamic information - fast, imprecise (but more precise than static slicing)

Page 45: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

81

71

51

41

31

11

21

Algorithm I Example1: b=0

2: a=2

3: 1 <=i <=N

4: if ((i++)%2= =1)

5: a=a+1 6: b=a*2

7: z=a+b

8: print(z)

T F

T

F

For input N=1, the trace is:

32

Page 46: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Algorithm II

A dependence edge is introduced from a load to a store if during execution, at least once, the value stored by the store is indeed read by the load (mark dependence edge)

No static analysis is needed.

Page 47: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

11

21

51

71

81

31

41

Algorithm II Example

1: b=0

2: a=2

3: 1 <=i <=N

4: if ((i++)%2= =1)

5: a=a+1 6: b=a*2

7: z=a+b

8: print(z)

T F

T

F

For input N=1, the trace is:

Page 48: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Algorithm II Example

1: b=0

2: a=2

3: 1 <=i <=N

4: if ((i++)%2= =1)

5: a=a+1 6: b=a*2

7: z=a+b

8: print(z)

T F

T

F

For input N=2, the trace is:

21 : save a11 : save b

31 : save i

41 : load i

51 : load/save a

32 : load/save i

61 : load a / save b

71 : load a, b / save z

81 : load z

42 : load i

Page 49: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Algorithm II – Compare to Algorithm I

More precise

b=…

…=b…=b

Algo. I

b=…

…=b…=b

Algo. II

Page 50: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Efficiency: Summary

For an execution of 130M instructions: Space requirement: about 1.5GB Time requirement: About 10 min

JSlice http://jslice.sourceforge.net/

Page 51: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Dynamic Dependence Graph Sizes

ProgramStatements Executed (Millions)

Dynamic Dependence

Graph Size(MB)

300.twolf

256.bzip2

255.vortex

197.parser

181.mcf

134.perl

130.li

126.gcc

099.go

140

67

108

123

118

220

124

131

138

1,568

1,296

1,442

1,816

1,535

1,954

1,745

1,534

1,707

Page 52: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Classic Dynamic Slicing in DebuggingBuggy Runs LOC EXEC

(%LOC)

BS (%EXEC)

flex 2.5.31(a) 26754 1871 (6.99%) 695 (37.2%)

flex 2.5.31(b) 26754 2198 (8.2%) 272 (12.4%)

flex 2.5.31(c) 26754 2053 (7.7%) 50 (2.4%)

grep 2.5 8581 1157 (13.5%) NA

grep 2.5.1(a) 8587 509 (5.9%) NA

grep 2.5.1(b) 8587 1123 (13.1%) NA

grep 2.5.1(c) 8587 1338 (15.6%) NA

make 3.80(a) 29978 2277 (7.6%) 981 (43.1%)

make 3.80(b) 29978 2740 (9.1%) 1290 (47.1%)

gzip-1.2.4 8164 118 (1.5%) 34 (28.8%)

ncompress-4.2.4 1923 59 (3.1%) 18 (30.5%)

polymorph-0.4.0 716 45 (6.3%) 21 (46.7%)

tar 1.13.25 25854 445 (1.7%) 105 (23.6%)

bc 1.06 8288 636 (7.7%) 204 (32.1%)

Tidy 31132 1519 (4.9 %) 554 (36.5%)

2.4-47.1% EXEC

Avg 30.9%

Page 53: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Advantages compared with StatisticalDebugging

Error-related code is guaranteed to be appear in the slice

Only requires the test case that reveals the bugs This is a large advantage for field bugs

reported by users

Page 54: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Issues about Dynamic Slicing

Slices are usually not very small (30% of the execution code)

Running history – very big ( GB ) Algorithm to compute dynamic slice

- slow and very high space requirement. On average, given an execution of 130M

instructions, the constructed dependence graph requires 1.5GB space.

Page 55: CS4723 Software Engineering Lecture 10 Debugging and Fault Localization

Review of Debugging

Debugging is a process after testing Steps:

Reproduce, Localize, Fix Approach in localization

Delta Debugging Statistic Debugging Dynamic Slicing