helix automatic software repair with evolutionary computation stephanie forrest westley weimer

20
Heli x Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

Upload: erik-allen-holland

Post on 12-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

Helix

Automatic Software Repair

with Evolutionary Computation

Stephanie ForrestWestley Weimer

Page 2: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixIntroduction

Automatic bug repair is an important unsolved problem in software engineering

Automated repair is needed for self-healing systems “The problem of security is the problem of software”

We combine state-of-the-art methods from programming languages with innovations in evolutionary computation

To repair bugs in publicly released software

Page 3: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixSummary of Method

Assume: Access to C source code Negative test case (input = 10593 ; output = infinite loop) Positive test cases (encode required program functionality)

Construct Abstract Syntax Tree using CIL Evolve repair that avoids negative test case and passes

positive test case Minimize repair using structural differencing and delta

debugging

Page 4: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

Helix

What is evolutionary computation?

4

Evolution in a computer:Individuals (genotypes) stored in the computer’s memoryEvaluation of individuals (artificial selection)Differential reproduction by copying and deletingVariation introduced by analogy with mutation and crossover

Page 5: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixExample: Microsoft Zune

Dec. 31, 2008. Microsoft Zune players mysteriously freeze up.

Bug: Infinite loop when input is last day of a leap year.

Negative test case: 10593, which corresponds to Dec 31, 2008.

Repair is not trivial. Microsoft’s recommendation was to let Zune drain its battery and then reset.

Downloaded from http://pastie.org/349916 (Jan. 2009).

Page 6: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

Helix

Evolutionary Computation Innovations

Start with a working program Focus on execution path through AST

Restrict mutation and crossover to execution path

Represent AST at level of statements Leaves out expressions, variable declarations

Genetic operators Don’t invent any new code, crossback,

macromutation operators Minimize repair size using structural differencing

Page 7: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixAST Representation

Page 8: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixWeighted Path

Nodes visited by negative test case have weight 1.0 Nodes visited by negative and positive test cases

have weight 0.01 All other nodes have weight 0.0

Page 9: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixThe Final Evolved Repair

Page 10: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

Helix

Summary of Repairs to Date

Twenty distinct defects in 7 classes:

Segfault: 7 Buffer overflows: 3 Infinite loops: 4 Incorrect output: 2 Integer overflow: 2 Non-overflow DOS: 1 Format string attack: 1

Twenty distinct programs totaling 186,603 LOC (180k LOC)

Scientific Computing: 1 Scripting Languages: 3 Games, Graphics, Sound:

4 Servers (web, ftp,

authentication): 4 Operating system utilities:

8

Page 11: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixBenchmark programs

ProgramLines of

Code (LOC)

Functionality FaultTime to Repair

zune 28 media player infinite loop 42s

gcd 22 handcrafted example infinite loop 153s

uniq 1146duplicate text

processingsegfault 34s

look-u 1169 dictionary lookup segfault 45s

look-s 1363 dictionary lookup infinite loop 55s

units 1504 metric conversion segfault 109s

deroff 2236 document processing segfault 131s

nullhttp 5575 webserver heap overflow 578s

indent 9906source code processing

infinite loop 546s

flex 18775lexical analyzer

generatorsegfault 230s

atris 21553 graphical tetris game stack overflow 80s

openldap io.c

6519 directory protocolnonoverflow

DOS665s

lighttp fastcgi.c

13984 webserver heap overflow 49s

php string.c 26044 scripting languageinteger

overflow6s

wu-ftpd 35109 FTP server format string 2256s

GECCO 2009, ICSE 2009, ACSAC (submitted)

Page 12: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixTime to Discover Repair

Time to repair: 3 - 10 minutes

Time includes: GP algorithm (selection, mutation, calculating

fitness, etc.) Running test cases Pretty printing and memoizing ASTs gcc (compiling ASTs into executable code)

No special hardware

Page 13: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixResearch Questions

Does it really work? Why does it work? How can we break it?

How does the representation affect size of search space? Order-of-magnitude reductions

What is the role of evolution? Variable. Random search often performs as well

How does the number of test cases affect results? Can improve results and reduce variability, but increases

search time How does the method scale with problem size?

Search time scales more than linearly but less than a quadratic

Page 14: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixSearch Time Scaling

m = 1.26

Page 15: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixWhy it Works

Generic approach Powerful intermediate representation Weighted path greatly reduces search space Minimization eliminates unnecessary fixes Most bugs can be fixed with a few local modifications

667 average atomic genetic operations to discover a repair; Repair discovered on average in 3.6 generations; 2.9 genetic operations per fitness evaluation

At least 1/2 the time, Random Search does as well as GP

Page 16: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixQuality of Repair

Manual checks for repair correctness. Microsoft requires that security-critical changes be subjected

to 100,000 fuzz inputs (randomly generated structured input strings).

Used SPIKE black-box fuzzer (immunitysec.com) to generate 100,000 held-out fuzz requests for web server examples.

In no case did GP repairs introduce errors that were detected by the fuzz tests, and in every case the GP repairs defeated variant attacks based on the same exploit.

Thus, the GP repairs are not fragile memorizations of the input.

GP repairs also correctly handled all subsequent requests from indicative workload.

Page 17: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixPapers and Awards

W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest ``Automatically finding patches using genetic programming.'’ ICSE (2009) Best Paper Award.

S. Forrest, W. Weimer, T. Nguyen, and C. Le Goues ``A Genetic Programming Approach to Automated Software Repair.'’ GECCO (2009) Best Paper Award.

C. Le Goues, T. Nguyen, W. Weimer, and S. Forrest ``Closed-Loop Repair of Security Vulnerabilities.'’ (ACSAC 25) (Submitted June 2009).

AWARD: Human-Competitive Results Produced by Genetic and Evolutionary Computation (Humie Award). $5000

IFIP TC2 Manfred Paul Award for Excellence in Software: Theory and Practice.1024 Euros

2nd International Workshop on Search-Based Software Testing. Best paper and best presentation.

17

Page 18: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixThe Future

Self-healing systems for security (next talk) Integrating anomaly detection to find negative test cases Runtime repair using software dynamic translation, e.g.,

Strata Repair templates, other search methods Repair quality carefully Consistency in distributed applications? N-version

diversity? Systematic study of large software code bases

Hypothesis: Most bugs are small A small step for GP, a large step for software?

Page 19: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

Helix

Evolutionary computation details

Fitness: Weighted sum of test cases that the program passes:

F(Programs that don’t compile) = 0 5 positive test cases (weight = 1), 1 or 2 negative test cases

(weight = 10) Mutation operations:

Delete a statement, Insert a statement, Swap a stmt along the weighted path with a stmt from another part of the program,

Crossover: Crosses back to original parent Population size is 40. Standard run is 10 gens + 10

gens

Page 20: Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

HelixMinimizing the final repair

Use tree-structured differencing (Al-Ekram et al. 2005)

View primary repair as a set of tree-structured operations Consider the One-minimal subset of repairs

Let Cp = {c1, c2, ... cn} be the set of changes in a primary repair

One-minimal subset is the minimal subset of Cp that passes all test cases

Delta debugging: Search for one-minimal subset using binary search

n2 time in worst case often linear