coevolutionary automated software correction josh wilkerson phd candidate in computer science...

25
Coevolutionary Automated Software Coevolutionary Automated Software Correction Correction Josh Wilkerson Josh Wilkerson PhD Candidate in Computer PhD Candidate in Computer Science Science Missouri S&T Missouri S&T

Upload: iris-melton

Post on 21-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Coevolutionary Automated Software Correction Coevolutionary Automated Software Correction

Josh WilkersonJosh Wilkerson

PhD Candidate in Computer SciencePhD Candidate in Computer Science

Missouri S&TMissouri S&T

Page 2: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 2High Level View of CASC

Page 3: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 3CASC Evolutionary Model

Page 4: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 4CASC Evolutionary Model

Page 5: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 5CASC Evolutionary Model

Page 6: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 6CASC Evolutionary Model

Page 7: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 7Reproduction Phase: Programs

Randomly select a genetic operation to perform

– Probability of operation selection is configurable and/or adaptive

Select individual(s) to use

– First select sub-set of individuals (i.e., tournament)

– Then perform fitness proportional selection in sub-set (i.e., roulette)

– Reselection allowed

Perform operation, generate new program(s)

Add new individuals to population

Repeat until specified number of individuals has been created

Page 8: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 8Reproduction Phase: Programs

Genetic Operations

– Reset

– Copy

– Crossover

• Two individuals are randomly selected based off fitness

• Randomly select and exchange compatible sub-trees

• Generates two new programs

– Mutation

• Off-by-one mutation bias

• Randomly select individual based off fitness

• Randomly select and change mutable node

• Generate a new sub-tree (if necessary)

– Architecture Altering Operations

• Delete a line, add assignment, add flow control

Page 9: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 9Reproduction Phase: Test Cases

Reproduction employs uniform crossover

Same selection method as programs

Each offspring has a chance to mutate

Genes to mutate are selected random

Mutated gene is randomly adjusted

– The amount adjusted is selected from a Gaussian distribution

Page 10: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 10CASC Evolutionary Model

Page 11: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 11CASC Evolutionary Model

Page 12: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 12Evaluation Phase

All programs run against all test cases

– Full population exposure vs. population sampling

– Hash table used to avoid repeat evaluations

Executions scored based on input and output of the program

– Black box style

– Run-time exceptions and time-outs monitored

Fitness for program is average of all execution scores

– Test case scores are directly related to this value

Page 13: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 13CASC Evolutionary Model

Page 14: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 14CASC Evolutionary Model

Page 15: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 15CASC Evolutionary Model

Page 16: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 16CASC Implementation Details

Adaptive parameter control

– EAs typically have many control parameters

– Difficult to find optimal settings for these parameters

– In CASC genetic operator probabilities are adaptive parameters

– Rewarded/punished based on performance

• If one operator is generating improved individuals more than the others make it more likely to be used

– Allows the system to adapt to the different phases in the search

Page 17: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 17CASC Implementation Details

Parallel Computation– Computational complexity is generally a problem for EAs

– CASC typically writes and compiles thousands of programs on a given run

• Typically executes millions of evaluations (literally)

– To reduce run times executions are done in parallel (NIC cluster)

• All other evolutionary phases are done in serial

– Main node: responsible for generating and writing programs

– Worker nodes: responsible for compiling and executing programs

– Dramatically speeds up execution

Page 18: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 18CASC Criticisms

Scalability

– The problem space is infinite for even simple programs

– Must correct software in reasonable time, regardless of program size

Fitness Function Design

– Each new problem for CASC requires a new fitness function

– Infinite possible fitness functions

– Limited number of high quality fitness functions

– Design of high quality fitness functions is extremely difficult

Page 19: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 19Scalability: ARCD

Automated Relevant Code Discovery (ARCD) System

– Preprocessor for CASC

– Uses bug localization techniques to remove irrelevant lines of code from consideration

– Ensemble of analysis methods

• Each method generates a set of suspect lines of code

• Results are combined together and a relevant code set is generated

– Voting system

– Confidence levels

• Employ state of the art bug localization techniques

• Exploit the availability of fitness function

– Prototype is under development

– Three techniques currently implemented

• Positive/negative trace comparison

• Line suspicion based on fitness

• Fitness run-time plot

Page 20: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 20ARCD: Pos./Neg. Trace Comparison

1 2 3 4 5 6 7 4 5 6 7 8 9 4 5 8 9 10 11 121 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 02 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 03 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 04 0 0 0 4 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 05 0 0 0 0 5 0 0 0 2 0 0 0 0 0 2 0 0 0 0 04 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 05 0 0 0 0 2 0 0 0 2 0 0 0 0 0 2 0 0 0 0 04 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 05 0 0 0 0 2 0 0 0 2 0 0 0 0 0 2 0 0 0 0 0

10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3

Positive Trace

Negative Trace

Page 21: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 21ARCD: Fitness Plots

Fitness Plots

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60 70

Lines Executed

Fitness

Incorrect Program

Correct Program

Page 22: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 22Scalability: CC-CoEA

Cooperative-Competitve Coevolution (CC-CoEA)

– Multiple program populations

– Cooperative coevolution of program components

– Each sub-population is focused on a specific portion of the program

– Components are selected from each population and a program is assembled

– Fitness indicates how well each component operated

– Divide the problem space into smaller, more manageable pieces

– Allow CASC to “freeze” sub-populations that are suspected to have converged

Page 23: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 23Scalability: CC-CoEA

Page 24: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 24Fitness Function Design

Current approach: guide for fitness function generation

– Formalize the thought process for fitness function design

– Incorporate quality measures to assure quality fitness functions

– Incorporate advanced fitness function techniques, mapped to problem characteristics (indicate when techniques will be useful)

– Extend to be useful for black box search algorithms that use fitness functions

– Implement as semi-automated tool for fitness function design

Alternative approach

– Exploit formal specifications

• Information about expected program operation

• Possibly generate new, correct code from scratch

– No evidence this approach will be superior

• Many open problems

• One-to-many relationships

Page 25: Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T

Page 25

Questions?