a data-driven model for software reliability...

32
A Data-Driven Model for Software Reliability Prediction Author: Jung-Hua Lo IEEE International Conference on Granular Computing (2012) Young Taek Kim KAIST SE Lab. 9/4/2013

Upload: others

Post on 08-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

A Data-Driven Model for Software Reliability Prediction

Author: Jung-Hua Lo

IEEE International Conference on Granular Computing (2012)

Young Taek Kim

KAIST SE Lab.

9/4/2013

Page 2: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Introduction

Background

Overall Approach

Detailed Process

Experimental Results

Conclusion

Discussion

Contents

2 / 31

Page 3: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Definition of SW Reliability

Probability of failure-free operation of a software product in a specified environment for a specified time.

SRM (Software Reliability Model)

To estimate how reliable the software is now.

To predict the reliability in the future.

Two categories of SRMs

Analytical Models: NHPP SRMs

Data-Driven Models: ARIMA, SVM

SW Reliability Prediction

3 / 31

Introduction Detailed Process Background Experimental Results Conclusion Discussion Overall Approach

Page 4: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Data Driven Model

4 / 31

Limitations of Analytical Models

• Software behavior changes during testing phase

Assumption of “all faults are independent & equally detectable”

is violated by the dataset.

Data Driven Models

• Much less unpractical assumptions:

developed from collected failure data.

• Easy to make abstractions and generalizations of the SW failure

process:

the approach of regression or time series analysis.

Introduction Detailed Process Background Experimental Results Conclusion Discussion Overall Approach

Page 5: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

5 / 31

Motivation

Problems

Actual SW failure data set is rarely pure linear or nonlinear

No general model suitable for all situations

Proposed Solution

Hybrid strategy with both linear and nonlinear predicting model • ARIMA model: Good performance in predicting linear data

• SVM model: Successful application to nonlinear data

Introduction Detailed Process Background Experimental Results Conclusion Discussion Overall Approach

Page 6: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Statistical properties (mean, variance, covariance, etc.) are all constant over time.

6 / 31

Stationarity

2 2

(1) ( ) .

(2) ( ) [( ) ] .

(3) ( , ) .

t y

t t y y

t t k k

E y u for all t

Var y E y u for all t

Cov y y for all t

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11

≠ μ1, σ12, γ1 μ2, σ2

2, γ2

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11

Differencing = μ2, σ22, γ2 μ1, σ1

2, γ1

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 7: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

7 / 31

ACF (Autocorrelation Function)

The correlation between observations at different distances apart (lag)

where

n

t

t

n

kt

ktt

k

yy

yyyy

r

1

2

1

)(

))((

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

1

n

t

t

y

yn

Page 8: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

8 / 31

PACF (Partial ACF)

The degree of association between yt and yt-k, when the effects of other time lags 1, 2, 3, …, k-1 are removed.

where

for j = 1, 2, … , k-1.

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

,3,2

1

,1

1

1

,1

1

1

,1

1

k if

rr

rrr

k ifr

rk

j

kjk

k

j

jkjkk

kk

jkkkkjkkj rrrr ,1,1

PACF

Page 9: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Differencing

Differenced series:

9 / 31

Removing Non-stationarity

1 ttt yyy

PACF

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 10: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

10 / 31

3 Prediction Models for Stationary Data

AR (Auto Regressive)

Model

MA (Moving Average)

Model

ARMA (Auto Regressive & Moving Average)

Model

• Use past values in forecast

• AR(p) 𝑦𝑡 = α1𝑦𝑡−1 + α2𝑦𝑡−2 + ⋯+α𝑝𝑦𝑡−𝑝 + 𝜀𝑡

• Use past residuals (random events) in

forecast

• MA(q) 𝑦𝑡 = 𝜀𝑡 + 𝛽1𝜀𝑡−1 + ⋯+ 𝛽𝑞𝜀𝑡−𝑞

• Combination of AR & MA

• ARMA(p, q)

𝑦𝑡 = α1𝑦𝑡−1 + α2𝑦𝑡−2 + ⋯+α𝑝𝑦𝑡−𝑝 + 𝜀𝑡

+𝛽1𝜀𝑡−1 + ⋯+ 𝛽𝑞𝜀𝑡−𝑞

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 11: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

11 / 31

AR (Auto Regressive) Model (1/2)

AR(p)

𝑦𝑡 = α1𝑦𝑡−1 + α2𝑦𝑡−2 + ⋯+α𝑝𝑦𝑡−𝑝 + 𝜀𝑡

α𝑖: 𝐴𝑢𝑡𝑜𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡

𝜀𝑡: 𝑒𝑟𝑟𝑜𝑟 𝑎𝑡 𝑡

Selection of a model

ACF decreasing exponentially • Directly: 0<a<1

• Oscillating patter: -1<a<0

PACF identifying the order

of AR model

Lag

Au

toco

rre

lati

on

50454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Autocorrelation Function for AR1 data series(with 5% significance limits for the autocorrelations)

Exponentially Decreasing (oscillating)

Lag

Pa

rtia

l A

uto

co

rre

lati

on

2018161412108642

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Partial Autocorrelation Function for AR1 data series(with 5% significance limits for the partial autocorrelations)

PACF

Cut off at Lag 1 AR(1)

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 12: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

12 / 31

MA (Moving Average) Model (1/2)

MA(q)

𝑦𝑡 = 𝜀𝑡 + 𝛽1𝜀𝑡−1 + ⋯+ 𝛽𝑞𝜀𝑡−𝑞

𝛽𝑖:𝑀𝐴 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟

𝜀𝑡: 𝑒𝑟𝑟𝑜𝑟 𝑎𝑡 𝑡

Example

Year Sales(B$) MA(3)

2000 1000

2001 1500

2002 1250

2003 900 1250

2004 1600 1217

2005 950 1250

2006 1650 1150

2007 1750 1400

2008 1200 1450

2009 2000 1533

2010 2100 1650

2011 1767

800

1300

1800

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

MA(3)

Sales(B$) MA(3)

1000 + 1500 + 1250

3

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 13: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Selection of a model

ACF identifying the

order of MA model

PACF decreasing

exponentially • Directly: 0<a<1

• Oscillating patter: -1<a<0

13 / 31

MA (Moving Average) Model (2/2)

PACF

Lag

Pa

rtia

l A

uto

co

rre

lati

on

2018161412108642

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Partial Autocorrelation Function for MA1 data series(with 5% significance limits for the partial autocorrelations)

Exponentially Decreasing (oscillating)

Lag

Au

toco

rre

lati

on

50454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Autocorrelation Function for MA1 data series(with 5% significance limits for the autocorrelations)

Cut off at Lag 1 MA(1)

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 14: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

14 / 31

ARMA Model

ARMA(p,q) = AR(p) + MA(q)

𝑦𝑡 = α1𝑦𝑡−1 + α2𝑦𝑡−2 + ⋯+α𝑝𝑦𝑡−𝑝 + 𝜀𝑡

𝛽1𝜀𝑡−1 + ⋯+ 𝛽𝑞𝜀𝑡−𝑞

Procedures for model identification

• ▶ Guideline to determine

• p, q for ARMA

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 15: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Auto Regressive Integrated Moving Average

(By Box and Jenkins (1970))

Linear model for forecasting time series data: Future values is a linear function of several past observations.

ARIMA(p, d, q)

ARIMA Model

15 / 31

Moving average of order q

Integrated differentiation of order d (Expand to Non-Stationary Time Series)

Auto Regression of order p

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 16: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Proposed by Vladimir N. Vapnik (1995, Rus)

An algorithm (or recipe) for maximizing a particular mathematical function with respect to a given collection of data

4 Key Concepts:

Separating hyperplane

Maximum-margin hyperplane

Soft margin

Kernel function

SVM (Support Vector Machine)

16 / 31

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 17: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Separating Hyperplane

17 / 31

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

w x + b<0

w x + b>0

Separating Hyperplane (= Classifier)

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 18: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Maximum Margin

18 / 31

denotes +1

denotes -1

Support Vectors are those data points that the margin pushes up Against Only Support vectors are used to specify the separating hyperplane!!

f(x,w,b) = sign(w x + b)

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

X-

x+ M=Margin Width

Page 19: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

19 / 31

Nonlinear SVMs Datasets that are linearly separable with some noise work out

great:

But what are we going to do if the dataset is just too hard?

How about… mapping data to a higher-dimensional space:

Kernel Function (1/2)

0 x

0 x

x2

x

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 20: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

20 / 31

Nonlinear SVMs: Feature Spaces General idea: The original input space can always be mapped

to some higher-dimensional feature space where the training set is separable linearly.

Definition of Kernel Function: some function that corresponds to an inner product in some expanded feature space.

Kernel Function (2/2)

x

Φ: x → φ(x)

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Page 21: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

21 / 31

Genetic Algorithm

Background Detailed Process Introduction Experimental Results Conclusion Discussion Overall Approach

Search & Optimization technique

By J. Holland, 1975

Based on Darwin’s

Principle of Natural

Selection

Basic operations

Crossover

Mutation

Create inintial, random population

(potential solutions)

Evaluate fitness for

each population

Optimal or "good"

solution found?

Selection

or kill population

No

Crossover

Mutation

END

Page 22: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Overall Approach (1/2)

22 / 31

Detailed Process Introduction Experimental Results Conclusion Discussion Background Overall Approach

Model Identification

Model Estimation

Is satisfied

model checking?

No

Trained ARIMA Model

(Linear Forecasting)

Yes

ARIMA

Data Set

Trained SVM Model

Fitness Evaluation

Stop

Criteria?

Genetic Operations

No

Trained SVM Model

(Nonlinear Forecasting)

Yes

+

Software Reliability

Prediciton

Nonlinear Residual

Support Vector Machines

Initial

Parameters

Chromosome 1

Chromosome 2

Chromosome N

...

Random Initial Population

Training SVM Model

Data Set

Model Identification

Model Estimation

Is satisfied

model checking?

No

Nonlinear Residual

Initial

Parameters

Chromosome 1

Chromosome 2

Chromosome N

...

Random Initial Population

Training SVM Model

Trained SVM Model

Fitness Evaluation

Stop

Criteria?

Trained SVM Model

(Nonlinear Forecasting)

Genetic Operations

Yes

Trained ARIMA Model

(Linear Forecasting)

+

Software Reliability

Prediciton

Yes

No

Support Vector Machines ARIMA

Page 23: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Overall Approach (2/2)

23 / 31

Detailed Process Introduction Experimental Results Conclusion Discussion Background Overall Approach

Xt = Lt + Nt

Xt : Time series data

Lt : Linear part of time series data

Nt : Nonlinear part of time series data

After ARIMA model processing, we can get 𝑳 𝒕, 𝜺𝒕: 𝐿 𝑡: Predicted value of the ARIMA model

𝜀𝑡: residual at time t from the linear model 𝜀𝑡= Xt - 𝐿 𝑡

Finally, the residuals (𝜀𝑡) will be modeled by the SVM model with GA (Genetic Algorithm).

Page 24: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

ARIMA Process (1/2)

24 / 31

Stationarize input data - Differencing, determine d - ACF, PACF checking

Determination of the values of p and q - ACF, PACF checking

MA(q) AR(p) ARMA(p,q)

ACF Cuts after q Tails off Tails off

PACF Tails off Cuts after p Tails off

MLE (Maximum Likelihood Estimation) - Find a set of parameters q1,q2, ..., qk to maximize L(q1,q2, ... , qk)= f(x1,x2, ... , xN;q1,q2, ... , qk)

Data Set

Model Identification

Parameter Estimation

Is satisfied

model checking?

No

SW Reliability Prediction

Yes

Introduction Background Experimental Results Conclusion Discussion Detailed Process Overall Approach

Page 25: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

ARIMA Process (2/2)

25 / 31

Data Set

Model Identification

Parameter Estimation

Is satisfied

model checking?

No

SW Reliability Prediction

Yes

Residual randomness Check - Residuals of the well-fitted model

will be random and follow the normal distribution

- Check ACF and PACF

Introduction Background Experimental Results Conclusion Discussion Detailed Process Overall Approach

Page 26: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

SVM Process (1/2)

26 / 31

o Due to the characteristics of input data (randomness), random initial population selected

- ex: C, ε, σ

o Data set is divided into two part: training & testing data

Introduction Background Experimental Results Conclusion Discussion Detailed Process Overall Approach

Nonlinear Residual

Initial

Parameters

Chromosome 1

Chromosome 2

Chromosome N

...

Random Initial Population

Training SVM Model

Trained SVM Model

Fitness Evaluation

Stop

Criteria?

Trained SVM Model

(Nonlinear

Forecasting)

Genetic Operations

Yes

No

Page 27: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

SVM Process (2/2)

27 / 31

o The higher fitness value, the more survivability ability

o The high-fitness valued candidate chromosome retained, & combined to produce new offspring.

Introduction Background Experimental Results Conclusion Discussion Detailed Process Overall Approach

Nonlinear Residual

Initial

Parameters

Chromosome 1

Chromosome 2

Chromosome N

...

Random Initial Population

Training SVM Model

Trained SVM Model

Fitness Evaluation

Stop

Criteria?

Trained SVM Model

(Nonlinear

Forecasting)

Genetic Operations

Yes

No

o GA is applied to SVM parameter search

- No theoretical method for determining a kernel function and its parameter

- No a priori knowledge for setting kernel parameter C.

o Applied GA operations - Crossover operation - Mutation operation

Page 28: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Experimental Results (1/2)

Introduction Background Experimental Results Conclusion Discussion Overall Approach Experimental Results Detailed Process

Collected data: cumulative number of failures, 𝑥𝑖 , at time 𝑡𝑖

Data Set (DS-1) • RADC (Rome Air Development Center) Project reported by Musa

• 21 weeks tested, 136 observed failures

Output: predicted value, 𝑥𝑖+1, using (𝑥1, 𝑥2,…, 𝑥𝑖)

Goodness of fit curves Relative Error curves

28 / 31

Page 29: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Experimental Results (1/2)

Introduction Background Experimental Results Conclusion Discussion Overall Approach Experimental Results Detailed Process

Collected data: cumulative number of failures, 𝑥𝑖 , at time 𝑡𝑖

Data Set (DS-2) • 28 weeks SW test, 234 observed failures

Output: predicted value, 𝑥𝑖+1, using (𝑥1, 𝑥2,…, 𝑥𝑖)

Goodness of fit curves Relative Error curves

29 / 31

Page 30: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Conclusion

Introduction Background Experimental Results Discussion Overall Approach Conclusion Detailed Process

Proposed hybrid methodology in forecasting software reliability:

exploits unique strength of the ARIMA model and the SVM model

Test results

showed improvement of the prediction performance

30 / 31

Page 31: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Discussion

Introduction Background Experimental Results Overall Approach Discussion Detailed Process

Pros

Providing a possible solution of SRM selection difficulties

Improving SW reliability prediction performance

Cons

Not present detailed test methods (ex: stop criteria for SVM, parameter estimation criteria for ARIMA, etc.)

Conclusion

31 / 31

Page 32: A Data-Driven Model for Software Reliability Predictionse.kaist.ac.kr/wp-content/uploads/2013/09/A-Data-Driven... · 2013. 9. 9. · SRM (Software Reliability Model) To estimate how

Thank you!