the application of genetic programming

57
lujing

Upload: jett

Post on 15-Jan-2016

58 views

Category:

Documents


1 download

DESCRIPTION

The Application Of Genetic Programming. lujing. Contents:. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model Forecasting time series using a methodology based on autoregressive integrated moving average and genetic programming - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Application Of Genetic Programming

lujing

Page 2: The Application Of Genetic Programming

1. Time Series Forecasting for Dynamic Environments : The DyFor Genetic Program Model

2. Forecasting time series using a methodology based on autoregressive integrated moving average and genetic programming

3. Forecasting nonlinear time series of energy consumption using a hybrid dynamic model

4. Genetic programming-based voice activity detection

Page 3: The Application Of Genetic Programming

Abstract Review Of Existing Time Series Forecasting

Methods The DYFOR GP Model

Page 4: The Application Of Genetic Programming

Several studies have applied genetic programming (GP) to the task of forecasting with favorable results. However , these studies, like those applying other techniques, have assumed a static environment.

Page 5: The Application Of Genetic Programming

If a time series is produced in a nonstatic environment, frequently only the recent historical data that correspond to the current environment are analyzed and historical data that come from previous environments are ignored.

Page 6: The Application Of Genetic Programming

This study investigates the development of a new “dynamic” GP model that is specifically tailored for forecasting in nonstatic environments.

This Dynamic Forecasting Genetic Program (DyFor GP) model incorporates features that allow it to adapt to changing environments automatically as well as retain knowledge learned from previously encountered environments.

Page 7: The Application Of Genetic Programming

The DyFor GP model is tested for forecasting efficacy on both simulated and actual time series including the U.S. Gross Domestic Product and Consumer Price Index Inflation.

Page 8: The Application Of Genetic Programming

Classical Methods:1) exponential smoothing methods;2) regression methods;3) autoregressive integrated moving average

(ARIMA)methods;4) threshold methods;5) generalized autoregressive conditionally heteroskedastic

(GARCH) methods.

Page 9: The Application Of Genetic Programming

Modern Heuristic Methods:1) methods based on neural networks (NNs);2) methods based on evolutionary computation ( GP ) .

Page 10: The Application Of Genetic Programming

As discussed in Section II, existing forecasting methods rely, to some degree, on human judgment to designate an appropriate analysis window (i.e., the correct number of historical data to be analyzed).

Page 11: The Application Of Genetic Programming

Consider the following example. Suppose the time series given in Fig. 4 is to be analyzed and forecast.

Page 12: The Application Of Genetic Programming

As depicted in the figure, this time series consists of two segments each with a different underlying data generating process.

The first segment’s process represents an older environment that no longer exists but may contain patterns that can be learned and exploited when forecasting the current environment.

The second segment’s underlying process represents the current environment and is valid for forecasting future data.

Page 13: The Application Of Genetic Programming

This is accomplished in the following way. 1) Select two initial window sizes, one of size n

and one of size n+i , where n and i are positive integers.

2) Run dynamic generations at the beginning of the historical data with window sizes n and n+i , use the best solution for each of these two independent runs to predict a number of future data points, and measure their predictive accuracy.

Page 14: The Application Of Genetic Programming

3) Select another two window sizes based on which window size had better accuracy. For example, if the smaller of the two window sizes (size n) predicted more accurately, then choose two new window sizes, one of size n and one of size n-i. If the larger of the two window sizes (size n+i) predicted more accurately, then choose window sizes n+i and n+2i.

Page 15: The Application Of Genetic Programming

4) Slide the analysis window to include the next time series observation. Use the two selected window sizes to run another two dynamic generations, predict future data, and measure their prediction accuracy.

5) Repeat the previous two steps until the analysis window reaches the end of historical data.

Page 16: The Application Of Genetic Programming
Page 17: The Application Of Genetic Programming

However , after several window slides, when the data analysis window spans data from both the first and second segments, it is likely that the window adjustment reverses direction. Figs. 7 and 8 show this phenomenon.

Page 18: The Application Of Genetic Programming
Page 19: The Application Of Genetic Programming

In Fig. 7, win1 and win2 have sizes of 4 and 5, respectively. As the prediction data, pred lies inside the second segment, it is likely that the dynamic generation involving analysis window win1 has better prediction accuracy than that involving win2 because win1 includes less data produced by a process that is no longer in effect. If this is so, the two new window sizes selected for win1 and win2 are sizes 3 and 4, respectively. Thus, as the analysis window slides to incorporate the next time series value, it also contracts to include a smaller number of inappropriate data. In Fig. 8, this contraction is shown.

Page 20: The Application Of Genetic Programming

After the data analysis window slides past the end of the first segment, it is likely to expand again to encompass a greater number of appropriate data. Figs. 9 and 10 depict this expansion.

Page 21: The Application Of Genetic Programming
Page 22: The Application Of Genetic Programming

As illustrated in the above example, the DyFor GP uses predictive accuracy to adapt the size of its analysis window automatically.

When the underlying process is stable (i.e., the analysis window is contained inside a single segment), the window size is likely to expand.

When the underlying process shifts (i.e., the analysis window spans more than one segment), the window size is likely to contract.

Page 23: The Application Of Genetic Programming

Abstract ARIMA Model Hybrid Forecasting Model The Model Development

Page 24: The Application Of Genetic Programming

The autoregressive integrated moving average (ARIMA), which is a conventional statistical method, is employed in many fields to construct models for forecasting time series. Although ARIMA can be adopted to obtain a highly accurate linear forecasting model, it cannot accurately forecast nonlinear time series.

Page 25: The Application Of Genetic Programming

This study proposes a hybrid forecasting model for nonlinear time series by combining ARIMA with genetic programming (GP). Finally, some real data sets are adopted to demonstrate the effectiveness of the proposed forecasting model.

Page 26: The Application Of Genetic Programming

Box and Jenkins presented the ARIMA model in 1970.The method has been widely used in financial, economic and social scientific fields.

In the ARIMA(p, d, q) model, p is the order of auto-regression, d is the order of differencing, and q is the order of the moving average process.

Generally speaking, the ARIMA model can be represented as a linear combination of the past observations and past errors as follows:

Page 27: The Application Of Genetic Programming

,...3,2,)1(

)1)(1(2

21

221

tBBB

yBBBB

tq

q

tdp

p

where is the actual value, B is the backward shift operator, is the constant item, is the random error at time t, and are the coefficients of the model and can be estimated utilizing the leastsquare method.

tyt

pq

Page 28: The Application Of Genetic Programming

Several investigations have developed some hybrid forecasting models that combine different methods to reduce the forecast error.

Page 29: The Application Of Genetic Programming

The hybrid models can be expressed as follows:

where represents the original positive time series at time t; represents the linear component, and

is the nonlinear component of the model, respectively.

ttt NL y ( 1 )

tytL

tN

Page 30: The Application Of Genetic Programming

The residuals can be obtained using the ARIMA model:

where is estimated using such nonlinear methods as GP. is the forecasted value of and is estimated using the ARIMA model.

ttt Lyr ( 2 )

tLtr

tL

Page 31: The Application Of Genetic Programming

Accordingly, the residual can be rewritten as follows:

where represents the nonlinear function that is constructed using GP and is the random error term. The hybrid model for forecasting time series is:

tntttt rrrfr ),....,,( 21

ttt NLy

( 3 )

),....,,( 21 nttt rrrf

t

( 4 )

Page 32: The Application Of Genetic Programming

This study proposes a novel hybrid forecasting model, which combines ARIMA to model the linear component ( )of a time series and the GP to model the nonlinear component ( ), to improve the accuracy of ARIMA forecasting.

tLtN

Page 33: The Application Of Genetic Programming

The proposed hybrid approach is as follows: Step 1: The ARIMA model is utilized to model the linear

component of time series. That is, is obtained by using the ARIMA model.

Step 2: From Step 1, the residuals from the ARIMA model

can be obtained. The residuals are modeled by the GP model in Eq. (3).That is, is the forecast value of Eq. (3) by using GP.

Step 3: Using Eq. (4), forecasts of the hybrid model are

obtained by adding the forecasted values of linear and nonlinear components , yield in Step 1 and Step 2, respectively.

tL

tN

Page 34: The Application Of Genetic Programming

Abstract Energy Consumption Models Hybrid Dynamic Grey Forecasting

Page 35: The Application Of Genetic Programming

Energy consumption is an important index of the economic development of a country. Rapid changes in industry and the economy strongly affect energy consumption.

Although traditional statistical approaches yield accurate forecasts of energy consumption, they may suffer from several limitations such as the need for large data sets and the assumption of a linear formula.

This work describes a novel hybrid dynamic approach that combines a dynamic grey model with genetic programming to forecast energy consumption.

Page 36: The Application Of Genetic Programming

3.2.1. GM(1,1) forecasting model This model can be constructed as follows: Step 1: Obtain positive time-series data as follows:

Step 2: Apply the accumulated generating operator (AGO) to the original time-series data (i.e. ) to obtain the accumulated time-series as follows:

Where

4)],(,),3(),2(),1([y )0()0()0()0()0( nnyyyy

)](,),3(),2(),1([y )1()1()1()1()1( nyyyy

)0(y)1(y

)1()1( )0()1( yy

n

m

myny1

)0()1( )()(

Page 37: The Application Of Genetic Programming

Step 3: Construct GM(1,1) using a grey differential equation, where a and u denote the grey parameters of the GM(1,1) model, and represents the average of and . Also, the grey parameters of the grey differential equation can be estimated using the ordinary least squares (OLS) method.

utazty )()( )1()0(

)()1( tz )1()1( ty)()1( ty

Page 38: The Application Of Genetic Programming

Step4:Replace the estimated parameters ( and ) in the grey differential equation and then obtain the GM(1,1) forecasting equation using the inverse AGO (IAGO) technique, in the following exponential form.

,3,2,)1)()1(()1()()( )1()0(

)1()1()0(

teea

uyttt taayyy

u

a

Page 39: The Application Of Genetic Programming

3.2.2. Dynamic GM(1,1) model

Some studies have developed dynamic GM(1,1) models (DGM(1,1)) to increase the forecasting accuracy of GM(1,1).

In the DGM(1,1) model, is predicted using GM(1,1) and where

k < n. Following the determination of , is added to the original time-series, and is removed from the original time-series to

yield a new series

)1()0( ky)](,),3(),2(),1([y )0()0()0()0()0( kyyyy

)1()0( ky)1()0( ky

)1()0(y

)]1(,),4(),3(),2([y )0()0()0()0()0(1 kyyyy

Page 40: The Application Of Genetic Programming

The predicted value of can be obtained using the new series . The evaluation procedure is continued to obtain

for l=3,4,5,…, n -k -1.

)2()0( ky)0(

1y

)()0( lky

Page 41: The Application Of Genetic Programming

This section describes a novel nonlinear hybrid dynamic forecasting model that combines the dynamic grey model with GP. The proposed model is derived as follows:

Page 42: The Application Of Genetic Programming

Step 1: Assume that original time-series of energy consumption data is (n data points), and that

is predicted using a novel DGM(1,1) model (NDGM(1,1)). Because GM(1,1) requires at least four data points to construct the forecasting model, Therefore, in the first rolling, can be determined from the series

ty

ty

)1()0( ky

))3(),2(),1(),(( )0()0()0()0( kykykyky

Page 43: The Application Of Genetic Programming

In the second rolling, can be determined from

Moreover, in each rolling cycle, the newly predicted values of original data

are determined using the GM(1,1) model. The residual series of the NDGM (1,1) model can be expressed .

)2()0( ky))2(),1(),(),1(( )0()0()0()0( kykykyky

),...)2(),1(()0()0(

kyky

ttt yye

Page 44: The Application Of Genetic Programming

Step 2: In each rolling cycle of NDGM(1,1), construct the model for forecasting the error using the nonlinear function , determined by GP as follows:

where denotes the jth point estimate of NDGM(1,1) that is conditioned in the ith rolling cycle; the series represents the errors of the ith rolling cycle and can be obtained using the GM(1,1) model in the four periods; represents a random error.

jir ,

),4(,),6,2(),5,1(

),,,( ,4,3,2,1,,

nn

rrrrfr jijijijijiji

jir ,

),,,( 4,3,2,1, jijijiji rrrr

ji ,

Page 45: The Application Of Genetic Programming

In the GP model, the input variables are the lagging residual series and the output variable is .

To reduce the forecasting error, the fitness function in GP is defined as follows :

),,,( 4,3,2,1, jijijiji rrrrjir ,

4,,2,1,:5

,,

nirrMinimizen

jjiji

Page 46: The Application Of Genetic Programming

Step 3: Express the hybrid dynamic forecasting model that combines the NDGM(1,1) model and the GP model as follows.

where denotes the forecasted value of y; represents the series ; and

represents the series

tt eyy

y

ty))(),...,6(),5((

)0()0()0(

nyyy

te

nnrrr ,46,25,1 ,...,,

Page 47: The Application Of Genetic Programming

Abstract Definition of GP-VAD algorithm

Page 48: The Application Of Genetic Programming

A voice activity detector (VAD) is a classifier the output of which is 1 or 0 indicating, respectively, the presence of voice or silence (noise) in each speech frame

A voice activity detection (VAD) algorithm is generated by using genetic programming (GP). The inputs of this VAD are the parameters extracted from the speech signals according to the ITU-T G.729B VAD standard.

The GP-based VAD algorithm (GP-VAD) is evaluated using the AURORA-2 database.

Page 49: The Application Of Genetic Programming

GP-VAD employs the same five parameters extracted by G.729B within each 10 ms frame :

a) the full-band energy, ;b) full-band energy difference, ; c) low-band energy difference, ; d) zero-crossing rate difference, ; e) the spectral distortion, . Let Y(n) be the GP-VAD decision at frame n. The

previous decisions Y(n-1) and Y(n-2) are also incorporated as inputs.

fEfElEZC

S

Page 50: The Application Of Genetic Programming

For the GP-VAD approach, the five preparatory steps mentioned above were defined as follows:

a)Function set.b)Terminal set.c)Fitness measure.d)Control parameters.e)The termination criterion

Page 51: The Application Of Genetic Programming

There are arithmetic functions and logical functions. The function set is F={+, -, *, %, AND, OR, NOT, GT}, where % is the protected division , i.e. it returns 1 when division by zero takes place, otherwise returns the normal quotient. The AND, OR ,NOT and greater-than (GT) functions return the values 1 or 0 instead of Boolean values.

Page 52: The Application Of Genetic Programming

There are three types of terminals: numerical constants, the parameters extracted by G.729B, and the VAD decisions in the two previous frames.

The terminal set is

T={ , , , , , ,Y(n-1), Y(n-2)},

where represents the set of floating points constants from 100.0 to -100.0.

fE fE lE ZC S

Page 53: The Application Of Genetic Programming

Every GP-VAD program tree i in the population is evaluated according to the error rate per unit defined as:

where Es and EN are the detection error rates on speech and non-speech frames , respectively; and the constant k is set to give more importance to ES than EN, typically k=0.4.

)()()(E ikEiEi Ns

Page 54: The Application Of Genetic Programming

The fitness function, f(i), is defined as:

where Q(i) is a penalty function, which penalizes individuals whose number of transitions between speech and non-speech stages, NTGPVAD , is greater than the target number of transitions of the model used as a reference, NTREF. The penalty function is defined as,

)(1.0)(1)( iQiEif

}0,{)(REF

REFGPVAD

NT

NTNTMaxiQ

Page 55: The Application Of Genetic Programming

The control parameters were set as follows:a) population size, 300; b) maximum number of generations, 500; c) maximum depth size, 17; d) tournament selection size, 2; e) crossover probability,0.9; f) standard mutation probability, 0.05;

Page 56: The Application Of Genetic Programming

The termination criterion was:a) The maximum number of generations.b) The result corresponded to the best evolved

individual

Page 57: The Application Of Genetic Programming