automatic program correction anton akhi friday, july 08, 2011

62
Automatic Program Correction Anton Akhi Friday, July 08, 2011

Upload: caitlin-bishop

Post on 17-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Program Correction Anton Akhi Friday, July 08, 2011

Automatic Program Correction

Anton AkhiFriday, July 08, 2011

Page 2: Automatic Program Correction Anton Akhi Friday, July 08, 2011

2

Plan

• Introduction• Survey of existing tools

Page 3: Automatic Program Correction Anton Akhi Friday, July 08, 2011

3

Why do we need automatic program correction?

• Even if bug has been found it is still programmer’s job to think of fix

• Allows to fix bug automatically or provide high-quality fix suggestions

Page 4: Automatic Program Correction Anton Akhi Friday, July 08, 2011

4

Isn’t it a magic?

• Tools require failing and passing runs• Oracle to check if test passes• Program with bug is not far away from the

right one

Page 5: Automatic Program Correction Anton Akhi Friday, July 08, 2011

5

Existing tools

• Genetic programming:– Automatic Program Repair with Evolutionary Computation– A Novel Co-evolutionary Approach to Automatic Software

Bug Fixing• Machine learning:

– BugFix• Contract usage:

– Automated Debugging using Path-Based Weakest Preconditions

– AutoFix-E– AutoFix-E2

Page 6: Automatic Program Correction Anton Akhi Friday, July 08, 2011

6

Automatic Program Repair with Evolutionary Computation

• W. Weimer, S. Forrest, C. Le Goues, T. Nguyen• Usage of GP to fix bugs:– Individuals– Genetic Operators– Fitness function

Page 7: Automatic Program Correction Anton Akhi Friday, July 08, 2011

7

Genetic programming: individuals

• Individual is represented as an abstract syntax tree and weighted path

• Weighted path is a list of statements visited on negative test case with weight of every statement:– 1 if not visited by positive tests– 0.1 if visited by positive tests

Page 8: Automatic Program Correction Anton Akhi Friday, July 08, 2011

8

Genetic programming: operators

• Mutation:– statement on a weighted pass is considered for

mutation with probability proportional to its weight

– deletion, insertion, swap• Crossover:– exchange of subtrees chosen at random between

two individuals

Page 9: Automatic Program Correction Anton Akhi Friday, July 08, 2011

9

Genetic programming: fitness function

• Weighted sum of test cases passed• Weight of a negative test is at least as much as

positive test

Page 10: Automatic Program Correction Anton Akhi Friday, July 08, 2011

10

Minimizing changes

• Trim unnecessary edits• One-minimal subset of changes is a subset

such that without any of the changes program will stop passing all the tests

• Use delta debugging to compute one-minimal subset of changes

Page 11: Automatic Program Correction Anton Akhi Friday, July 08, 2011

11

Automatic Program Repair with Evolutionary Computation: Conclusions

• Needs:– negative and positive test cases– oracle– fault localization

• Uses:– genetic programming– Delta debugging

• Method has a lot of criticism

Page 12: Automatic Program Correction Anton Akhi Friday, July 08, 2011

12

A Novel Co-evolutionary Approach to Automatic Software Bug Fixing

• A. Arcuri and X. Yao• Genetic Programming• Distance Functions• Search Based Software Testing• Co-evolution

Page 13: Automatic Program Correction Anton Akhi Friday, July 08, 2011

13

Genetic Programming

• Genetic Program consists of primitives• Program is represented in a tree form• Fitness function to be minimized:

• is a number of nodes in a program• is a number of raised exceptions• is a special distance function

iTt

P tgtdgE

gE

gN

gNgf ,

11

gN

gE

tgtd P ,

Page 14: Automatic Program Correction Anton Akhi Friday, July 08, 2011

14

Distance Functions

• Works fine with numbers and boolean expressions

• Predicates involving and can be handled only in cases small

Q

Page 15: Automatic Program Correction Anton Akhi Friday, July 08, 2011

15

Search Based Software Testing

• Find tests that make evolutionary programs fail

• Fitness function for test case to be maximized:

iGg

P tgtdtf ,

Page 16: Automatic Program Correction Anton Akhi Friday, July 08, 2011

16

Co-evolution

• Competitive co-evolution• First generation: copies of buggy program and

unit tests• Mutations, crossover, replacement by the

original program• Penalty for short programs

Page 17: Automatic Program Correction Anton Akhi Friday, July 08, 2011

17

A Novel Co-evolutionary Approach to Automatic Software Bug Fixing: Conclusions

• Needs:– starting set of unit tests– oracle

• Uses:– genetic programming– co-evolution– search based software testing

• Some bugs are too difficult to solve• Bug that is difficult to be fixed by a human

might be very easy for program

Page 18: Automatic Program Correction Anton Akhi Friday, July 08, 2011

18

BugFix: A Learning-Based Tool to Assist Developers in Fixing Bugs

• D. Jeffrey, M. Feng, N. Gupta, R. Gupta• Association rule learning• Interesting value mapping pairs (IVMP)• Situation descriptors• Knowledgebase of rules

Page 19: Automatic Program Correction Anton Akhi Friday, July 08, 2011

19

Association rule learning

• Association rule learning is a popular method for discovering the relationship between variables in database

• Association rule where X, Y are sets of attributes and means that if the items in set X are present then it is probable that the items in set Y are also present

• The confidence of the rule is

where supp(X) is the fraction of transactions containing X

YX YX

XsuppYXsuppYXconf

Page 20: Automatic Program Correction Anton Akhi Friday, July 08, 2011

20

Interesting Value Mapping Pairs

• An Interesting Value Mapping Pair (IVMP) is a pair of value mappings (original, alternate) associated with a particular statement instance in a failing run, such that: (1) original is the original value mapping used by the failing run at that instance; and (2) alternate is an alternate (different) value mapping such that if the values in original are replaced by the values in alternate at that instance during re-execution of the failing run, then the incorrect output of the failing run becomes correct

Page 21: Automatic Program Correction Anton Akhi Friday, July 08, 2011

21

Situation descriptors

• Statement structure situation descriptors• IVMP pattern situation descriptors• Value pattern situation descriptors

Page 22: Automatic Program Correction Anton Akhi Friday, July 08, 2011

22

Statement structure situation descriptors

• Unordered tokens comprising the statement

Page 23: Automatic Program Correction Anton Akhi Friday, July 08, 2011

23

Statement structure situation descriptors: examples

Page 24: Automatic Program Correction Anton Akhi Friday, July 08, 2011

24

IVMP pattern situation descriptors

• Consider pattern to occur when corresponding values in the IVMPs compare to each other in the same way across all IVMPs at a statement

• Compare in terms less than, greater than, or equal to

• Look at the pairs of values:– within original sets of values– within alternate sets of values– between corresponding values in sets

Page 25: Automatic Program Correction Anton Akhi Friday, July 08, 2011

25

IVMP pattern situation descriptors: examples

Page 26: Automatic Program Correction Anton Akhi Friday, July 08, 2011

26

Value pattern situation descriptors

• Similar to IVMP patterns

Page 27: Automatic Program Correction Anton Akhi Friday, July 08, 2011

27

Knowledgebase of rules

• Database of bug-fix scenarios• Initially created through training data of

known debugging situations

Page 28: Automatic Program Correction Anton Akhi Friday, July 08, 2011

28

How it works

• Identify rules to consider– rules in which debugging situation is subset of the

current debugging situation• Sort rules by confidence values• Report prioritized bug-fix descriptions• Learn from current debugging situation and

the corresponding bug fix

Page 29: Automatic Program Correction Anton Akhi Friday, July 08, 2011

29

BugFix: A Learning-Based Tool to Assist Developers in Fixing Bugs: Conclusions

• BugFix assists in fixing bugs by producing a list of bug-fix suggestions

• Tool learns through new situations• Is not very good with new and logically

difficult bugs

Page 30: Automatic Program Correction Anton Akhi Friday, July 08, 2011

30

Automated Debugging using Path Based ‑Weakest Preconditions

• H. He, N. Gupta• Representation of an error trace• Path-based weakest precondition• Hypothesized program state• Actual program state• Detection of evidence• Location and modification of likely errorneous

statement

Page 31: Automatic Program Correction Anton Akhi Friday, July 08, 2011

31

Representation of an error trace

• is an instance of an executed statement in an error trace; i is an execution point and j is the line number of statement

• Branch predicates:– atomic predicate: (expr relop const)– compound predicates are in disjunctive normal form:

where and is an atomic predicate

• Precondition and postcondition for failing run are transformed in DNF:– replace and by disjunction and conjunction

ji,

neeE 0 ni gge 0 ig

Page 32: Automatic Program Correction Anton Akhi Friday, July 08, 2011

32

Path based weakest precondition‑

• pwp(T, R) is the set of all states such that an execution of function F that flows execution trace T begun in any of them is guaranteed to terminate is state satisfying R

• where means substituting every occurrence of x in R with a

• where B is branch predicate•

axRRaxpwp ),( ax

RRBpwp ),(

)),(,(),;( RDpwpCpwpRDCpwp

Page 33: Automatic Program Correction Anton Akhi Friday, July 08, 2011

33

Hypothesized program state

• Let where is a trace from point i to the end of trace and R is postcondition

• defines the set of hypothesized program states at an execution point i

• and

RTpwpR nii ,,niT ,

iR

1, iii RStmtpwpR RStmtpwpR nn ,

Page 34: Automatic Program Correction Anton Akhi Friday, July 08, 2011

34

Actual program state

• Represented by predicates in DNF which are true for given input

• Consists of forward program states and backward program states

FiQ

BiQ

Bi

Fii QQQ

Page 35: Automatic Program Correction Anton Akhi Friday, July 08, 2011

35

Forward and backward program states

• Forward program states are defined as:– positive conjunctions in precondition– – – and are sets of predicates killed by and

derived from statement • Backward program states are defined as:– if is an assignment statement– if is a branch predicate–

FQ1 111 ii

Fi

Fi GenKillQQ

nnFn

Fbottom GenKillQQ

iKill iGen

iStmt

Bii

Bi QStmtpwpQ 1,

iStmtBii

Bi QStmtQ 1 iStmt

BbottomQ

Page 36: Automatic Program Correction Anton Akhi Friday, July 08, 2011

36

Detection of evidence

• A is less restrictive than B if is false• Evidence at point i is situation when is less

restrictive than• Two types of evidence:– Explicit• Type I• Type II

– Implicit

BA

iQ

iR

Page 37: Automatic Program Correction Anton Akhi Friday, July 08, 2011

37

Types of evidence

• Explicit– Type I: if at point i in appears negative r in form

(0 relop const) then r forms an explicit evidence of Type I

– Type II: let an atomic predicate in and a negative atomic predicate in If then q and r form an explicit evidence of Type II

• Implicit: negative predicate r in that is not present in an explicit evidence

iR iranyEexplicit ,,

1: constrelopexprq q iQ2: constrelopexprr r

rq exprexpr irqEexplicit ,,iR

1R

Page 38: Automatic Program Correction Anton Akhi Friday, July 08, 2011

38

Location and modification of likely erroneous statement

• Use transitivity and equality to deduce new predicates. New states are and

• Explicit Type I– If = then match r to 0=0. Consider every

assignment statement between i and bottom of the trace as a possible candidate for modification

– Let from be a corresponding predicate to r, and and

– Goal is to make , so – If then problem is solved

*iQ

*iR

kr*kR

rhslhsStmtk : 0 rhslhse 1 krc

00kr 00, 1 krrhslhspwp

01 rhslhsrk

Page 39: Automatic Program Correction Anton Akhi Friday, July 08, 2011

39

Location and modification of likely erroneous statement: Explicit Type I

• Consider e and c as a set of strings of characters• Let and be difference between e and c and

between c and e correspondingly• If appears in rhs than replace it with • If appears in lhs than replace it with only if

is a single variable• If none of the above works than select r as e and

try every q from as c

ed cd

ed cd

ed cd cd

*iQ

Page 40: Automatic Program Correction Anton Akhi Friday, July 08, 2011

40

Location and modification of likely erroneous statement: Explicit Type II

• Consider • Either q or r could be in error• Change the form of r to q at i– Same manner as above

• Change the form of q to r at i– Change original branch predicate from which q

may be derived or some assignment statement– Change of an assignment statement does not

change relop in q

irqEexplicit ,,

Page 41: Automatic Program Correction Anton Akhi Friday, July 08, 2011

41

Location and modification of likely erroneous statement: Implicit

• If there is a loop in the trace, which contributes some constraints on , and missed constraints have similarity with constraints added by the loop then try to derive the possible missing iterations in the loop

• Try to match negative r from to some q from *1Q

*1R

*1R

Page 42: Automatic Program Correction Anton Akhi Friday, July 08, 2011

42

Automated Debugging using Path Based ‑Weakest Preconditions: Conclusions

• Uses contracts• Can handle only one error• Cannot handle loops well

Page 43: Automatic Program Correction Anton Akhi Friday, July 08, 2011

43

Automated Fixing of Programs with Contracts

• Yi Wei, Yu Pei, C.A. Furia, L.S. Silva, S. Buchholz, B. Meyer, A. Zeller

• Assessing Object State• Fault Analysis• Behavioral Models• Generating Candidate Fixes• Linearly Constrained Assertions• Validating and Ranking Fixes

Page 44: Automatic Program Correction Anton Akhi Friday, July 08, 2011

44

Assessing Object State

• Argument-less Boolean Queries– Absolutely describe state– Seldom preconditions– Widely used in Eiffel contracts

• Complex Predicates– Boolean queries are often correlated– Implication expresses correlation– Mine the contracts– Mutate implications

Page 45: Automatic Program Correction Anton Akhi Friday, July 08, 2011

45

Fault Analysis

• Find state invariants: passing and failing states – sets of predicates that hold during all passing and failing runs respectively

• Fault profile contains all predicates that hold in the passing run but not in the failing run

• Find the strongest predicate that implies the negation of violated assertion

Page 46: Automatic Program Correction Anton Akhi Friday, July 08, 2011

46

Behavioral models

• Finite state automaton– States are predicates that hold– Transitions are routines

• Automaton is built based on test runs• Determine a sequence of routine calls that

change the object state appropriately• A snippet for a set of predicates is any

sequence of routines that drive object from a state where none of predicates hold to one where all of them hold

Page 47: Automatic Program Correction Anton Akhi Friday, July 08, 2011

47

Generating Candidate Fixes

• Fix Schemas• snippet is a sequence of routine calls• old_stmt – some statements in the original

program• fail:– – –

pnot

andnotandnot 21 pp

clauseviolated _not

Page 48: Automatic Program Correction Anton Akhi Friday, July 08, 2011

48

Linearly Constrained Assertions

• Determine what is variable and what is constant

• Assign weights:– Arguments in precondition receive lower weights– In assertion weight is inversely proportional to the

number of occurrences– Identifiers that routine can assign receive less

weight

Page 49: Automatic Program Correction Anton Akhi Friday, July 08, 2011

49

Linearly Constrained Assertions: Generating fixes

• Select a value for variable that satisfies the constraint– Look for extremal values

• Plug the value into a fix schema– if not constraint then new_stmt else old_stmt end

Page 50: Automatic Program Correction Anton Akhi Friday, July 08, 2011

50

Validating and Ranking Fixes

• Candidate is valid if it passes all the tests• Two metrics for ranking:– Dynamic: estimates the difference in runtime

behavior between the fix and the original based on state distance

– Static: • OS: 0 for schemas (a) and (b) and number of

statements in old_stmt for (c) and (d)• SN: number of statements in snippet• BF: number of branches to reach old_stmt from the

point of injection of the instantiated fix schema

BFSNOS 5.25

Page 51: Automatic Program Correction Anton Akhi Friday, July 08, 2011

51

Automated Fixing of Programs with Contracts: Conclusions

• Uses contracts, passing and failing tests to deduce and suggest bug fixes

• Successfully proposed fixes for 16 out of 42 found bugs in EiffelBase library

Page 52: Automatic Program Correction Anton Akhi Friday, July 08, 2011

52

Evidence-Based Automated Program Fixing

• Yu Pei, Yi Wei, C.A. Furia, M. Nordio, B. Meyer• Predicates, Expressions, and States• Static Analysis• Dynamic Analysis• Fixing Actions• Fix Candidate Generation• Validation of Candidates

Page 53: Automatic Program Correction Anton Akhi Friday, July 08, 2011

53

Predicates, Expressions

• is a set of all non-constant expressions in routine r

• is a set of boolean predicates:– Boolean expressions: every– Voidness checks: for every– Integer comparisons: for every and

every and in – Complements for every

rE

rP

rEb

rEeVoide ee ~ rEe

0\ eEe r ~ ,,

p rPp

Page 54: Automatic Program Correction Anton Akhi Friday, July 08, 2011

54

States

• State Components– - a triple where v is a value of predicate p for

some test case t which riches l– comp(T) denotes all the triples defined by

the tests in the set

vpl ,,

vpl ,,

Tt

Page 55: Automatic Program Correction Anton Akhi Friday, July 08, 2011

55

Static Analysis

• sub(e) is the set of all sub-expressions of e• Expression proximity • Expression dependence between predicate

and a contract clause c• Control distance is the length of the

shortest directed path from to on the control-flow graph

• Control dependence

2121, esubesubeeeprox

rPp

rPceprox

cpeproxcpedep

|,max

,,

21 , llcdist

1l 2l

jrjcdist

jlcdistjlcdep

and|,max

,1,

Page 56: Automatic Program Correction Anton Akhi Friday, July 08, 2011

56

Dynamic Analysis

• is a score for tests• for i-th failing test and for i-th

passing test; • The evidence provided by each test case:

t it it

1,0

rcj

r PvvFuuvpldyn ||,, ,

Page 57: Automatic Program Correction Anton Akhi Friday, July 08, 2011

57

Combining Static and Dynamic Analysis

• Combined evidence score:

111 ,,,,

3,,

vpldynjlcdepcpedep

vplfixme

Page 58: Automatic Program Correction Anton Akhi Friday, July 08, 2011

58

Fixing Actions

• A component with a high evidence score induces a number of possible actions:– Derived Expressions– Expression Modification– Expression Replacement

vpl ,,

Page 59: Automatic Program Correction Anton Akhi Friday, July 08, 2011

59

Fix Candidate Generation

• Candidates are generated in a way similar to previous method

• Candidate is valid if it passes all the tests

Page 60: Automatic Program Correction Anton Akhi Friday, July 08, 2011

60

Evidence-Based Automated Program Fixing: Conclusions

• Uses contracts and sets of passing and failing tests

• Combines static and dynamic approaches

Page 61: Automatic Program Correction Anton Akhi Friday, July 08, 2011

61

Conclusion

• Automated Bug Fixing is real!• Some bugs are still too difficult to fix or even

localize• All approaches need some kind of oracle;

sometimes contracts are the oracle

Page 62: Automatic Program Correction Anton Akhi Friday, July 08, 2011

62

References

• W. Weimer, S. Forrest, C. Le Goues, T. Nguyen, Automatic Program Repair with Evolutionary Computation

• A. Arcuri and X. Yao, A Novel Co-evolutionary Approach to Automatic Software Bug Fixing

• D. Jeffrey, M. Feng, N. Gupta, R. Gupta, BugFix: A Learning-Based Tool to Assist Developers in Fixing Bugs

• H. He, N. Gupta, Automated Debugging using Path Based ‑Weakest Preconditions

• Yi Wei, Yu Pei, C.A. Furia, L.S. Silva, S. Buchholz, B. Meyer, A. Zeller, Automated Fixing of Programs with Contracts

• Yu Pei, Yi Wei, C.A. Furia, M. Nordio, B. Meyer, Evidence-Based Automated Program Fixing