a generalized pattern matching approach for multi-step

Upload: kent-choo

Post on 09-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    1/16

    A generalized pattern matching approach for multi-stepprediction of crude oil price

    Ying Fan a,1, Qiang Liang a,b,2, Yi-Ming Wei a,,3

    a Center for Energy and Environmental Policy Research, Institute of Policy and Management,

    Chinese Academy of Sciences, Beijing, 100080, Chinab Graduate University, Chinese Academy of Sciences, Beijing, 100080, China

    Received 10 November 2005; received in revised form 22 October 2006; accepted 22 October 2006

    Available online 5 December 2006

    Abstract

    This paper applies pattern matching technique to multi-step prediction of crude oil prices and proposes a new

    approach: generalized pattern matching based on genetic algorithm (GPMGA), which can be used to forecast

    future crude oil price based on historical observations. This approach can detect the most similar pattern incontemporary crude oil prices from the historical data. Based on the similar historical pattern, a multi-step

    prediction of future crude oil prices can be figured out. In GPMGA modeling process, the traditional pattern

    matching is not directly employed. Historical data is transformed to larger or smaller scales in thex-axis and the

    y-axis directions, so that a generalized price pattern reflecting current price movement can be obtained. This

    treatment overcomes the local deficiency of the traditional pattern modeling in recognition system approach

    (PMRS), and in addition to this, a matched historical pattern in a larger pattern size can be found. Since the

    approach takes not only historical similarities but also differences into account, the concept of generalized

    pattern matching is proposed here. It proves a new basis for multi-step prediction by finding out more essential

    similarities through various transformations. The related empirical study is constructed for a one-month

    forecasting of the Brent and WTI crude oil prices, and satisfying forecasting results are attained. At the end,

    Available online at www.sciencedirect.com

    Energy Economics 30 (2008) 889904

    www.elsevier.com/locate/eneco

    Grant sponsors: National Natural Science Foundation of China under grant Nos. 70425001, 70573104 and 70371064,

    and the Key Projects of National Science and Technology of China (2001-BA608B-15, 2001-BA605-01). Corresponding author. Institute of Policy and Management (IPM), Chinese Academy of Sciences (CAS), P.O. Box 8712,

    Beijing 100080, China. Tel./fax: +86 10 62650861.

    E-mail address: [email protected](Y.-M. Wei).1 Dr. Ying Fan is a Professor at the Institute of Policy and Management, Chinese Academy of Sciences, China. Her

    research field is energy policy and system engineering. In 2004, she was a visiting scholar at Cornell University, USA.2 Mr. Qiang Liang is a Ph.D. candidate in Management Science at the Institute of Policy and Management, Chinese

    Academy of Sciences, China.3 Dr. Yi-Ming Wei is a Professor at the Institute of Policy and Management of the Chinese Academy of Sciences, China.

    He was a visiting scholar at Harvard University in the United States in 2005.

    0140-9883/$ - see front matter 2006 Elsevier B.V. All rights reserved.

    doi:10.1016/j.eneco.2006.10.012

    mailto:[email protected]://dx.doi.org/10.1016/j.eneco.2006.10.012http://dx.doi.org/10.1016/j.eneco.2006.10.012mailto:[email protected]
  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    2/16

    comparisons with some other time series prediction approaches, such as PMRS and Elman network,

    demonstrate the effectiveness and superiority of GPMGA over others.

    2006 Elsevier B.V. All rights reserved.

    Keywords: Pattern matching; Genetic algorithm; Crude oil price; Multi-step prediction

    1. Introduction

    Crude oil, sometimes called the blood of industries, plays an important role in any

    economies. Oil price, as one of the main focal point in many countries, becomes an

    increasingly essential topic of concern to governments, enterprises and investors. Influenced

    by many complicated factors, oil prices appear highly nonlinear and even chaotic as Panas and

    Ninni (2000) and Adrangi et al. (2001) pointed out, which makes it rather difficult to forecastthe future oil prices especially in multi-step prediction. Nevertheless, multifarious oil price

    forecast approaches are projected, based on which many decisions with regard to oil prices

    have to be made.

    Oil price forecast approaches basically involve two classes, single-factor time series models

    and multi-factor models. The first one considers time as an independent variable and builds up

    mathematical models based on the oil price time series so as to produce future predictions; while

    the latter also takes into account the main influential factors of oil prices such as GDP, supply and

    demand into consideration, and multi-variable models are often constructed to forecast future oil

    prices. In comparison with multi-factor models which have to foretell future values of influencing

    factors before predicting oil prices, single-factor models appear to have the advantage of avoidinguncertainty involved in predicting other related variables.

    There is substantial literature on time series methods related to oil price prediction, but most

    of them only deal with single-step forecasting. Dominguez (1989), Crowder and Hamed

    (1994), Moosa and Al-Loughani (1994) and Gulen (1998) mention that one-month forward

    price is a remarkable indicator for short-term predictions of oil prices, and a few researches

    indicate that oil prices takes on evident GARCH properties. A semi-parametric approach based

    on VAR prediction approach is suggested to obtain a forecast of the entire density function of

    the price of an asset (Barone-Adesi et al., 1998). The forecast ability of one-month forward as

    an indicator of short-term oil prices is considered again by Claudio (2001), and the approach

    proposed by Barone-Adesi et al. (1998) is adopted to perform a short-term forecast of Brent oilprice. Belief Network is introduced by Abramson and Finizza (1991), Abramson (1994) and

    Monte Carlo Analysis is used to predict the crude oil price. Afterwards, the combination of

    Belief Network and Probabilistic Model presented by Abramson and Finizza (1995) is applied

    to take a probabilistic forecast in oil markets. Kaboudan (2001) uses compumetric methods to

    perform short-term monthly forecasts of crude oil prices and suggests that genetic

    programming has advantage over random walk predictions while the neural network forecast

    proved inferior. Tang and Hammoudeh (2002) find that the nonlinear model based on the

    Target Zone Theory can greatly improve the oil price forecasting ability. Ye et al. (2002, 2005,

    2006a,b)) presents a short-term forecasting model of monthly West Texas Intermediate crude

    oil spot prices using OECD petroleum inventory levels. Yousefi et al. (2005) introduces a

    wavelet-based prediction procedure and market data on crude oil is used to provide forecasts

    over different forecasting horizons. Sadorsky (2006) uses several different univariate and

    multivariate statistical models such as TGARCH and GARCH to estimate forecasts of daily

    890 Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    3/16

    volatility in petroleum futures price returns. Ye et al. (2006a,b) shows the effect that surplus

    crude oil production capacity has on short-term crude oil prices.

    Recent years, a variety of new theories like neural network, Markov models and generalized

    system are applied to the financial prediction (Prigmore and Long, 2003; Kodogiannis andLolis, 2002; Fong and See, 2002). Peters (1994) points out that most financial markets are not

    Gaussian distributed, but tend to have sharper peaks and fat tails, a phenomenon well known

    currently in practice. Under such evidence, a lot of traditional methods based on Gaussian

    normal assumption have their shortage in making better forecasts. One of the key findings

    explained by Peters (1994) is that most financial markets have a long memory; what happens

    today affects the future forever. In other words, current data are correlated with all past data to

    varying extents. The long-memory component of the market cannot be adequately explained

    by a system that works with short-memory parameters. Therefore, prediction approaches based

    on historical pattern matching should be chosen for long-memory systems. Fractal Market

    Hypothesis also offers sturdy support for the feasibility of historical pattern matchingapproaches. Peters (1994) provided extensive evidence supporting his claim that markets have

    a fractal structure based on different investment horizons, and they cover longer distance than

    the square root of time, i.e. they are not random in nature. Instead, they consist of non-

    periodical cycles which are hard to detect and use in statistical or neural forecasting. He argues

    that financial markets are predictable, but accurate predicting algorithms need long memories.

    Farmer and Sidorowich (1988) find that chaotic time series prediction using local

    approximation techniques is much better than global approximations. Local approximation

    refers to the idea of breaking up the time domain into small neighborhood regions and

    analyzing them separately. As an empirical study, the combination of genetic algorithm, neural

    network and chart pattern recognition is employed to forecast stock prices (Leigh et al., 2000).The integration of chart pattern recognition and wavelet radial basis function network is also

    applied for the same task (Liu et al., 2004). The nearest neighbor approach presented by

    Farmer and Sidorowich (1988) and the pattern imitation techniques presented by Motnikar

    et al. (1996) are used for time series prediction. Recently, the pattern modeling and recognition

    system (PMRS) presented by Singh (1999a,b; 2001), Singh and Fieldsend (2001) is adopted

    for financial time series forecast. The successful application of local approximation approaches

    such as chart pattern recognition, nearest neighborhood approaches, pattern imitation

    techniques, PMRS and so on indicates that local approximation approaches are quite effective

    for the prediction of nonlinear time series.

    Oil price time series is a nonlinear long-memory series (Alvarez-Ramirez et al., 2002, 2003;Robinson and Yajima, 2002; Bernabe et al., 2004; Gil-Alana, 2001). As such, we can improve

    forecasting ability for oil price time series prediction by using local approximation approaches.

    In this paper, a new local approximation approach based on genetic algorithm and generalized

    pattern match is proposed, which is implemented for the prediction of crude oil price time series.

    The empirical study is constructed for a one-month forecasting of the Brent and WTI crude oil

    prices respectively, and the satisfying forecasting results are achieved. Finally, the comparison

    between GPMGA and some other time series prediction approaches, such as PMRS and Elman

    neural network, demonstrates the effectiveness and superiority of GPMGA.

    This paper is organized as follows: in Section 2, the generalized pattern matching approach is

    proposed after the pattern modeling and recognition system is introduced, and the generalized

    pattern matching approach based on genetic algorithm is described; in Section 3, the Elman

    network is employed as a comparative approach; in Section 4, an empirical study is conducted to

    show the precision of GPMGA; and the last section is the conclusion.

    891Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    4/16

    2. Generalized pattern matching based on genetic algorithm

    In this section, the pattern modeling and recognition system is introduced; then generalized

    pattern matching approach is proposed; a parameter optimization based on the genetic algorithmis described as well.

    2.1. Pattern modeling and recognition system (PMRS)

    Denote the time series as y = {y1,y2,, yn}, where n is its length. The past state sp= {yj k+1,

    yj k+ 2,yj} most similar to the current statesc= {yn k+1,yn k+2,yn} can be searched among the

    historical data of the time series by using pattern matching approaches, where kis the size of the

    state, state sp is so called the nearest neighbor of state sc. Then the past state {yj+1,yj+2,yj+ m} is

    used to forecast the future state {yn+1,yn+ 2,yn+m}, where m is the prediction length. A segment

    of the time series y is defined as = (i,i+1,,i+ k1), where kis the size of segment, 1 i i +k1n1, j=yj+1yj, ij i + k1. The offset between two states {yp,yp+1,yp+ k} and

    {yq,yq+1,yq+ k} is denoted as j Pk

    i1 wi dpi1dqi1

    ; 1pp + kn, 1qq + kn,

    where wi is the weight, 1 ik. A pattern is defined as = (bi,bi+1,bi+ k1), where k is the

    size of pattern , 1 i i + k1n1, bj 1;yj1zyj0;yj1byj

    ;

    &iji + k1, bjis so called tag, i + k1

    is called marker position. The main point of PMRS is to model the current pattern of a time series by

    directly matching its current pattern with past pattern, and then a forecast can be made according to

    the pattern following the most similar past pattern (Singh et al., 1999a,b; Singh, 2001; Singh and

    Fieldsend, 2001).

    2.2. Generalized pattern matching (GPM)

    PMRS can be considered as a direct pattern matching method, since the current pattern and

    the past pattern are matched without any transformation. Historical data being directly searched

    to match the past state most similar to the current one implies that current oil prices are expected

    to change exactly according the historical rules. However, the complexity of oil price

    movements often makes this kind of direct matching to be inaccurate. On one hand, historical

    rules do not always appear in the same way. For example, oil prices had risen by 2 dollars in the

    past two months, and following oil prices appeared to show a similar movement but with more

    rapidly rising speed, i.e. it took only 1 month to rise by 2 dollars. These phenomena may becaused by the differences of the time and market conditions, although the rule behind it may be

    similar. In other words, historical rules may have only the similarity other than the exact right. On

    the other hand, small pattern size is usually used for a strong local search in the PMRS method,

    which can ensure that current pattern and the past pattern match well only in relatively short

    periods, while seriously deviating in longer pattern sizes. Thus, when performing a multi-step

    prediction, it is hard to find similar status in the historical time series by using PMRS. Here, a

    generalized pattern matching approach (GPM) is proposed in order to improve PMRS method in

    these two aspects.

    2.2.1. The rational of GPM

    Denote a state {yi,yi+1,yi+ k1} of the time series as s(i,k), 1 i i + k1n, where i, so

    called the beginning position of state s, indicates the position of the first element of state s in the

    time series y, kis the size of state s, namely, the number of the elements in state s. Obviously the

    892 Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    5/16

    subsequent state of state s is the state whose beginning position is just behind the last element of

    state s. For instance, the subsequent state of state s(i,k) = {yi,yi+1,,yi+ k1} is s(i + k,k) = {yi+ k,

    yi+ k1,yi+ k+ k1}, where k is the size of s.

    States in the time series are directly used to implement pattern matching in PMRS, whilein the generalized pattern matching, states are imposed appropriate scale transform both in

    the x-axis and the y-axis directions beforehand, equivalent to match states after historical

    rules of oil price movements are appropriately adjusted. Suppose the series (x1,x2,xM) is

    obtained from state s(i,k) through a transformation, a pattern is defined as (i,M,,), where

    i and M are the starting position and size of pattern , and are respectively called the

    scale factors in the x-axis and the y-axis directions of pattern , the state s(i,k) before

    transform is called the original state of pattern . This procedure is described in detail as

    follows.

    The original state is transformed in the x-axis direction by the factor. In general, suppose

    (0,+), the original state s(i,k) is disposed through linear interpolation, and then pattern is attained after a scale transform in the x-axis direction. Denote t j1

    a; t= t , pattern and

    its original state s satisfies following equations:

    M kaa 1b c 1

    xj yiPt tPt

    yi

    Ptyi

    Pt1

    2

    where 1jM.

    Likewise, the original state is transformed in the y-axis direction by the factor . In general,suppose (0,+), the original state s(i,k) is simply multiplied by . Pattern and its original

    state s satisfy following equations:

    M k 3

    xj byij1 1VjVk: 4

    The transform factors and represent the differences between the current price

    fluctuation and the similar historical phenomenon. reflects the relation between current price movements and the past in the time scale given a certain price change. If =1, the

    current pattern takes on the same scale as the searched historical one, i.e., it takes the same

    time to realize the given price change both at present and in the past. If b1, the current

    pattern is what the historical one scales down to, i.e., it takes a shorter period to fulfill the

    price change at present than in the past. IfN1, the current pattern is what the historical one

    scales up to, i.e., it takes a longer period to achieve the price change at present than in the

    past. Similarly, reflects the relation between current price movements and the past in the

    price scale given a certain period. If=1, the current pattern manifests the same scale as the

    searched historical one, i.e., there is a same price change during the given period both at

    present and in the past. IfN1, the current pattern is what the historical one scales up to, i.e.,

    larger price change appears during the period at present than in the past. If b1, the current

    pattern is what the historical one scales down to, i.e., smaller price change takes place during

    the period at present than in the past.

    893Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    6/16

    Unlike PMRS, which matches the current state and the past one directly, GPM takes the

    differences between the current state and the past one in addition to their similarities into

    consideration, and the past one is scaled both in the x-axis and the y-axis directions to match the

    current one indirectly, so it is suitable to be called

    generalized pattern matching

    . When GPM isadopted, the satisfying past pattern most similar to the current one could be found even if a large

    pattern size is required. Since the similarity between the current and the past patterns in

    generalized sense can be kept in a long period, a more satisfying result can be obtained when

    performing a multi-step prediction.

    2.2.2. The modeling process of GPM

    The same as PMRS, the pattern structure in GPM is determined by the pattern size. For time

    series y, Suppose current time is n, it is expected to get a m-step prediction. Denote actual values

    in the prediction period are {yn+1,yn+2,yn+m}; predicted values are {n+1,n+2,n+m}.

    For a certain k, current state sc(n i + 1,k) is taken as current pattern directly, i.e.

    qc ni 1; k; 1; 1 sc ni 1; k 5

    The past pattern h(i,k,,), which is most similar to the current pattern c can be calculated

    by genetic algorithm explained in next section. Correspondingly, the original state of the past

    pattern is sh(i,k). Since the scale transform in the y-axis direction doesn't change the pattern size,

    the relation between k and k can be received from Eq. (1) as follows:

    k kVaa 1b c 6

    Therefore, kk+ 1 b k+1, that is:

    k a1

    aV kVb

    k a

    a7

    The minimal k being up to Eq. (6) should be taken as the size of the original state sh i.e.

    kVk a1

    a

    $ %8

    So the original state of pattern h(i,k,,) is sh i;ka1

    a

    : Subsequently, the beginning position

    of the consequent state sh is i ka1

    a

    : Since q Vh i

    ka1a

    ; m; a; b

    can be obtained from sh

    through transform, the size ofsh should bema1

    a

    : Pattern qVh i

    ka1a

    ; m; a; b

    can be attained

    from state s Vh i ka1

    a

    ;

    ma1a

    though the same transform as the one for pattern h.

    On the assumption that there exists local similarities in the time series according to many

    local approximation approaches, namely there are always similar periods in history, if the past

    pattern h generated from the past state sh through scale transforms can best match the current

    pattern c, then the pattern h generated from the consequent state sh is rationally expected to

    match the future pattern f. So f can be directly derived from h. In the mean time, f is

    generated from the future state sf without any transform just like c is not transformed either in

    894 Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    7/16

    the x-axis or the y-axis direction, therefore the future state sf can be derived from f directly,

    i.e. the result of m-step prediction is:

    yn1; yn2

    ;: : :

    ;

    ynm

    sf n 1; m qf n 1; m; 1; 1

    qhV

    i k a1

    a

    $ %; m; a; b

    :

    To quantify the difference between two patterns: 1(i1,M,1,1) = {x11,x2

    1,,xM1 } and 2(i2,M,2,

    2) = {x12,x2

    2,,xM2 } with the same pattern size in the pattern matching process, an offset

    between patterns 1 and 2 is defined as follows:

    j ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

    1M

    XMj1

    x1jx2j

    2

    vuut: 10

    To sum up, the modeling process of GPM can be described as follows. For each possible k, the

    steps below should be gone through:

    (1) Search the historical pattern h(i,k,,) most similar to the current pattern c(nk+ 1,k,1,1)

    whose original state is s(nk+ 1,k) = {yn k+1,yn k+2,yn} by minimizing the offset

    between h and c;

    (2) Figure out the original state sh of pattern h and its subsequent state sh accordingly;(3) Apply the same transform as the one for the best matched historical pattern h(i,k,,) to sh,

    and compute out the prediction {n+1,n+2,n+m};

    (4) Calculate the prediction error;

    (5) Choose the optimal k which minimizes the predicted error.

    Illustrated as above procedure, the optimal pattern size k relies on the enumerating approach,

    but the difference from PMRS is that a largerkdoesn't lead to a great departure between searched

    past patterns and the current pattern, and does make past patterns and the current pattern matched

    better in a longer period. It implies that GPM can more accurately reflect historical rules of oil

    price movement.To search for the past pattern h most similar to the current pattern c, following optimization

    problem should be solved: for a given pattern size k, find the optimal parameters i, and by

    minimizing the offset between the two pattern c(nk+ 1,k,1,1) and h(i,k,,). This question

    can be solved effectively by means of a genetic algorithm presented below.

    2.2.3. Generalized pattern matching based on genetic algorithm (GPMGA)

    Genetic algorithm as one of global optimizing methods simulates the evolving process of the

    life form in nature. Individuals of one generation exchange information with other individuals

    through genetic operators such as selection and crossover, so that a new better generation is

    obtained. Keeping the process iteratively, an optimal solution can be figured out. Genetic

    algorithm, which can avoid local optimization in the searching process, holds advantage over the

    common local search methods. The combination of genetic algorithm and other methods plays a

    more and more important role in the forecasting field. Leigh et al. (2000) integrate genetic

    895Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    8/16

    algorithm, neural network and chart pattern recognition to predict stock prices, Nunnari (2004)

    adopts wavelet function approximation based on genetic algorithm to forecast the air pollution

    time series, both of which attained good results.

    In this paper, genetic algorithm is employed to search the optimal solution of aboveoptimization problem. Suppose a pattern, (i,k,,), represents an individual, whose chromosome

    must involve the information of three parameters: i, and . Thus the decimal strings of these

    three parameters are assigned as the gene group of a chromosome.

    In the process of evolution, the fitness function, which is used to evaluate whether a individual

    is good enough, is defined as follows:

    g q 1

    gj11

    where is a constant, is the offset between the past pattern h and the current pattern c.

    To keep the evolution towards more optimal generations with time going on, the classicalroulette wheel approach combined with the elite strategy approach is implemented. During

    roulette wheel selection, two mates are selected for reproduction with a certain probability, which

    is in proportion to their fitness values.

    Two-point-crossover approach is implemented as the crossover operator. The substrings

    defined by the chosen two random points in the selected pair of strings are exchanged with a

    certain probability.

    Two-element swap mutation and self-adaptive mutation are implemented. The mutation

    temperature of an individual j is defined as follows (He et al., 2002):

    Tj U 0; 1 1g qj

    PNl1

    g ql

    0BBB@1CCCA 12

    where U(0,1) is the uniform distribution over the range [0,1].

    Let j respectively represents one of the parameters i, and , j= 1,2,3, then the new

    parameter, j, of the new individual after self-adaptive mutation can be defined as:

    hjV

    hj rTjN 0; 1 13

    where the constant [0,1] is called the severity coefficient, and N(0,1) is the standard normal

    distribution.

    3. Elman network

    The recurrent network developed by Elman has a simple architecture, and it can be trained

    using the standard BP learning algorithm. The context units of the Elman network memorize

    some past states of the hidden units, so the output of the network depends upon an aggregate of

    the previous states and the current input. In this architecture, in addition to the input, hidden and

    output units, there are also context units. The input and output units interact with the outside

    environment, while the hidden and context units do not. The input units are only buffer units that

    pass the signals without changing them. The output units are linear units which sum the signals

    fed to them. The hidden units have nonlinear sigmoidal functions. The feedforward connections

    896 Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    9/16

    are modifiable, but the recurrent are fixed. This network has been proved to be effective and was

    applied as a benchmark model in the research of Kermanshahi and Iwamiya (2002) and

    Kodogiannis and Lolis (2002).

    3.1. Data standardization

    Before the neural network modeling, a preprocessing of data by an appropriate transform can

    greatly improve the modeling result. One of the two common transforms is that which makes thetransformed data value limited in a certain interval, another is that the mean and the variance of

    the transformed data are made in a certain interval. The latter one is employed here, by which the

    mean and the variance are 0 and 1. Denote an original observation as yt, with mean yand variance

    Dy, then the transformed data y can be expressed as follows:

    ytVyt

    Py

    Dy: 14

    3.2. Related parameters

    3.2.1. Input layer

    The number of nodes in the input layer represents the length of the window, or the number of

    lagged observations used to discover the underlying pattern in a time series. This is the most

    crucial variable for a forecasting problem, since the vector contains important information about

    complex structure in the data. However, there is no widely accepted systematic way to determine

    the optimum length for an input vector (Zhang and Patuwo, 1998), therefore the Gamma test,

    autocorrelation and the ARMA model are used to analyze the data, and then the most appropriate

    node number of the input layer is decided accordingly (Stefansson et al., 1997; Sfetsos and

    Coonick, 2000.

    3.2.2. Hidden layers

    The number of the hidden nodes should make the training error as small as possible and the

    network architecture simplest, namely with as few hidden nodes as possible (Weiss and

    Fig. 1. The Brent crude oil prices (from 5/20/1987 to 7/26/2005).

    897Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    10/16

    Kulikowski, 1991). To enhance the network's performance, two hidden layers are considered.

    Denoting the number of the hidden nodes as S1 and S2 respectively in each hidden layer, which

    are the optimal values taken as the number of the hidden nodes by minimizing the training errors.

    3.2.3. Output layer

    There usually are two forecasting approaches for neural network, the iterative and the direct

    approaches (Wilson et al., 2002). The first approach requires only a single output node, withforecast values being substituted into the input vector to make further predictions, while the

    second one provides multiple output nodes that correspond to the forecasting horizon. In order to

    decrease the number of output nodes and fulfill a better result, we employ the first method.

    4. Empirical study

    4.1. Data

    Daily Brent and WTI crude oil prices, received from IEA website, are used as an empirical

    study. The price unit is dollar/barrel. To deal with a few missing data, a linear interpolation isperformed on the original data. In this study, there are 4681 data for Brent daily oil price dating

    from 5/20/1987 to 7/26/2005, illustrated as Fig. 1.

    Fig. 1 shows that the Brent oil price takes on violent local fluctuation. The fluctuation is kept

    even in a long period. So does the WTI prices. The WTI data consists of 4933 groups of daily

    observations from 1/2/1986 to 7/26/2005.

    To test the forecasting ability of GPMGA, PMRS and Elman network are employed as

    comparative approaches.

    All data are divided into three parts when PMRS, Elman network and GPMGA are employed

    to make a multi-step prediction, shown as Fig. 2.

    The modeling data are used for in-sample computation of the parameters of PMRS, Elmannetwork and GPMGA. The evaluating data are used for selection of the best parameters of these

    models. When a group of model parameters is calculated according to the modeling data, a

    prediction of the evaluating data is obtained and the forecasting results are compared with the

    Fig. 2. Data classification when modeling.

    Table 1

    The division of experimental data

    Data Parts Period Length

    Brent Modeling data From 5/20/1987 to 4/22/2005 4614

    Evaluating data From 4/25/2005 to 6/24/2005 45Testing data From 6/27/2005 to 7/26/2005 22

    WTI Modeling data From 1/2/1986 to 4/22/2005 4960

    Evaluating data From 4/25/2005 to 6/24/2005 45

    Testing data From 6/27/2005 to 7/26/2005 22

    898 Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    11/16

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    12/16

    4.3.2. Evaluation of forecasting results

    For Brent data, the forecasting results received from PMRS method, Elman network and

    FPMGA method are illustrated in Fig. 3.

    Fig. 3 shows that the forecasting result of PMRS is basically close to the actual prices in view

    of trend, but not in most prices. And the forecasting curve of PMRS is too smooth to exhibit

    violent fluctuation of oil prices within the short period. On the contrary, the forecasting result of

    Elman neural network, which moves downward or upward dramatically, indicates that it is a

    powerful nonlinear tool. Although rather exact predictions are produced by Elman network, theforecasting results of neural network have extreme fluctuations, far more than actual prices.

    In the obtained results of GPMGA, the predicted values tally with the actual prices not only at

    quite a few points but also the whole trend. The predicted curve is quite close to the actual curve

    of Brent prices at almost every point, which shows the best predictive ability among the others.

    For WTI data, the forecasting results of the three models are shown in Fig. 4.

    Fig. 4 shows that the predicted values received from PMRS model obviously deviate from the

    actual values in the late period, which is caused by the limitation of the small pattern size

    Fig. 4. Forecasting results of the three models for WTI data (From 6/27/2005 to 7/26/2005).

    Fig. 3. Forecasting results of the three models for Brent data (From 6/27/2005 to 7/26/2005).

    900 Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    13/16

    determined by PMRS, though the forecasting result in the early period is very close to the actual

    price. While, Elman network behaves well at only a few points, some of which even overlap the

    predicted and actual value. However it poorly behaves at most other points.

    Different from the above two models, the fluctuating rules are accurately mastered by GPMGA

    and a better result is produced out only in the locality but also in the whole prediction period.

    The Root Mean Square Error (RMSE) and the Mean Absolute Percentage Error (MAPE) ofeach model are shown in Table 4.

    The empirical study, a 21-step prediction of Brent and WTI oil prices (a month roughly),

    illustrates the predictive ability of three models. The prediction errors, MAPE, in these models are

    all below 8%. Among them, the MAPE of GPMGA model is around 2% both in Brent and WTI,

    which is much better than the PMRS and Elman network. This result clearly shows the

    effectiveness in dealing with multi-step predictions of oil prices.

    The oil price trend in the early period before 6/24/2005 appears as a rapid rising trend, even

    though a decreasing trend right after can be accurately predicted by GPMGA model, which

    implies the GPMGA can somewhat more accurately forecast the position of inflection points

    referring to historical information. Therefore, GPMGA avoids the disadvantage of linearprediction, which tends to tell people that future prices will keep rising from the rising trend of

    earlier prices. Strong evidence which shows that GPMGA can capture the nonlinear

    characteristics of price movements in oil time series is obtained from this empirical study.

    However, GPMGA takes much longer time to implement the modeling process. For our case,

    about 97 h was spent.

    4.4. Discussion

    The fluctuation of oil prices is often related with macro situation and key events in the world. In

    general, political events, military conflicts, serious climate abnormalities, catastrophes andaccidents in important oil-producing areas would lead to sharp changes in oil prices. Therefore, any

    long-term predictions should be based on them. However, reality is so much more complicated that

    it is hard to capture the similarity from all events or scenarios that may affect oil prices.

    As shown above, GPMGA model finds the similarity among patterns: starting from Sep. 2003

    and Jun. 2005, and gives a somewhat satisfying forecast in Jul. 2005 accordingly. So let's explore

    the events behind these patterns.

    In Sep. 2003, the events behind oil price rapidly rising are: i) the deteriorating situation in Iraq

    after war and the problematic postwar reconstruction. There were fears that frustration in the Middle

    East region might aggravate further leading to shortage of oil supply; ii) the world economy appear to

    rebound and recover. With economic recovery in most countries, oil demand increased by 7%

    compared to before 2003, but oil production only increased by lower than 4% during this period; iii)

    OPEC adopted the strategy of limiting oil production to maintain prices, cutting oil production even

    though demand increased; iv) World oil stocks reached their lowest levels in these 10 years. This also

    Table 4

    Predicted errors of three models

    Market Methods Elman network PMRS GPMGA

    Brent RMSE 2.1661 1.9618 1.7746MAPE(%) 3.23 2.93 2.43

    WTI RMSE 1.7733 1.5662 1.0909

    MAPE(%) 2.59 2.24 1.57

    901Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    14/16

    included the US, whose stocks at the end of 2003 was 12% less than the same period in 2002,

    amounting to only 0.27 billion barrels, the lowest figure within 20 years.

    In June 2005, the influential factors behind rapidly rising oil prices are i) the rate of recovery in

    the world economy is still high over an estimated above 3%, and the sustained strongadvancement of the US economy, contributing to greater oil demand; ii) The capacity of oil

    production and refining up to their limits with no easily-attainable ways to greatly increase oil

    production in order to meet the rising demand; iii) US entering into the summer season in which

    their demand will crest, with expectation from the public tends toward future oil price increases;

    iv) the persisting chaotic circumstances in the Middle East region and US uncertainties in Iraq; v)

    The arrival of the hurricane season in summer and its serious impact on the coastal oil-producing

    zones such as Mexico bay in US. For e.g. Hurricane Ivan caused heavy losses last year while

    hurricanes may cause even worse damages this year; vi) The oil stock of US decreasing from

    329 million bbl to 324.9 million bbl in the period of 10, June to 1, July.

    5. Conclusions

    The pattern matching is introduced to the prediction field of oil price in this paper and a new

    model, the generalized pattern matching based on genetic algorithm (GPMGA), is proposed to

    conduct multi-step forecasts of oil prices. In the GPMGA model, the past pattern most similar to

    the current pattern is searched from historical observations to predict future prices according to

    the historical rules represented by the matched past pattern. GPMGA overcomes some of defects

    of PMRS and Elman network in the prediction of long-memory time series.

    The empirical study for Brent and WTI crude oil prices illustrates the effectiveness of GPMGA.

    In this study, useful historical information is found by GPMGA, which have good global searchcapabilities, so that the best matched past pattern can be found both rapidly and accurately.

    GPMGA is not only a powerful model that can implement a better multi-step prediction, but also an

    important tool for information mining. By using it, the rules of oil price fluctuation related to macro

    situation and key events can be investigated through similarities between past and current oil price

    movements, highlighting the GPMGA as a practical analytic tool for detecting oil price movements.

    Acknowledgements

    The authors gratefully acknowledge the financial support from the National Natural Science

    Foundation of China (NSFC) under the grants Nos. 70425001, 70573104 and 70371064, the KeyProjects from the Ministry of Science and Technology of China (grants 2001-BA608B-15, 2001-

    BA60501). We also would like to thank Professor R.S.J. Tol and the anonymous referees for their

    helpful suggestions and corrections on the earlier draft of our paper according to which we improved

    the content.

    References

    Abramson, B., 1994. The design of belief network-based systems for price forecasting. Computers & Electrical

    Engineering 20, 163180.

    Abramson, B., Finizza, A., 1991. Using belief networks to forecast oil prices. International Journal of Forecasting 7, 299315.Abramson, B., Finizza, A., 1995. Probabilistic forecasts from probabilistic models: a case study in the oil market.

    International Journal of Forecasting 11, 6372.

    Adrangi, B., Chatrath, A., Dhanda, K.K., Raffiee, K., 2001. Chaos in oil prices? Evidence from futures markets. Energy

    Economics 23, 405425.

    902 Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    15/16

    Alvarez-Ramirez, J., Cisneros, M., Ibarra-Valdez, C., Soriano, A., 2002. Multifractal Hurst analysis of crude oil prices.

    Physica A 313, 651670.

    Alvarez-Ramirez, J., Soriano, A., Cisneros, M., Suarez, R., 2003. Symmetry/anti-symmetry phase transitions in crude oil

    markets. Physica A 322, 583596.

    Barone-Adesi, G., Bourgoin, F., Giannopoulos, K., 1998. Don't look back. Risk August, pp. 100103.Bernabe, A., Martina, E., Alvarez-Ramirez, J., Ibarra-Valdez, C., 2004. A multi-model approach for describing crude oil

    price dynamics. Physica A 338, 567584.

    Box, G., Jenkins, G.M., Reinsel, G., 1994. Time Series Analysis: Forecasting and ControlThird edition. Prentice Hall.

    Claudio, M., 2001. A semiparametric approach to short-term oil price forecasting. Energy Economics 23, 325338.

    Crowder, W., Hamed, A., 1994. A cointegration test for oil futures market efficiency. Journal of Futures Markets 13 (8),

    933941.

    Dominguez, K.M., 1989. The volatility and efficiency of crude oil futures contracts, ch.2. In: Dominguez, K.M., Strong, J.S.,

    Weiner, R.J. (Eds.), Oil and money: Coping with price risk through financial markets. Harvard International Energy

    Studies, pp. 4897.

    Farmer, J.D., Sidorowich, J.J., 1988. Predicting chaotic dynamics. In: Kelso, J.A.S., Mandell, A.J., Shlesinger, M.F. (Eds.),

    Dynamic Patterns in Complex Systems. World Scientific, Singapore, pp. 265292.

    Fong, W.M., See, K.H., 2002. A Markov switching model of the conditional volatility of crude oil futures prices. EnergyEconomics 24, 7195.

    Gil-Alana, L.A., 2001. A fractionally integrated model with a mean shift for the US and the UK real oil prices. Economic

    Modelling 18, 643658.

    Gourieroux, C., 1997. ARCH Models and Financial Applications. Springer-Verlag.

    Gulen, S.G., 1998. Efficiency in the crude oil futures markets. Journal of Energy Finance & Development 3 (1), 13 21.

    He, Y., Chu, F., Zhong, B., 2002. A hierarchical evolutionary algorithm for constructing and training wavelet networks.

    Journal of Energy Finance & Development 10, 357366.

    Kaboudan, M.A., 2001. Compumetric forecasting of crude oil prices. Proceedings of the 2001 Congress on Evolutionary

    Computation, vol. 1, pp. 283287.

    Kermanshahi, B., Iwamiya, H., 2002. Up to year 2020 load forecasting using neural nets. Electrical Power and Energy

    Systems 24, 789797.

    Kodogiannis, V.S., Lolis, A., 2002. Forecasting financial time series using neural network and generalized system-based

    techniques. Neural Computing & Applications 11, 90102.

    Leigh, W., Odisho, E., Paz, N., Paz, M., 2000. Progress report: improving the stock price forecasting performance of the

    bull flag heuristic with genetic algorithms and neural networks. IEA/AIE 2000, 617622.

    Liu, J.N.K., Kwong, R.W.M., Bo, F., 2004. Chart patterns recognition and forecast using wavelet and radial basis function

    network. KES 2004, 564571.

    Moosa, I.A., Al-Loughani, N.E., 1994. Unbiasedness and time varying risk premia in the crude oil futures markets. Energy

    Economics 16 (2), 99105.

    Motnikar, B.S., Pisanski, T., Cepar, D., 1996. Time-series forecasting by pattern imitation. OR-Spektrum 18 (1), 4349.

    Nunnari, G., 2004. Modelling air pollution time-series by using wavelet functions and genetic algorithms. Soft Computing

    8, 173178.

    Panas, E., Ninni, V., 2000. Are oil markets chaotic? A non-linear dynamic analysis. Energy Economics 22, 549

    568.Peters, E., 1994. Fractal Market Hypothesis: Applying Chaos Theory to Investment and Economics. Wiley.

    Prigmore, M., Long, J.A., 2003. A comparison of the effectiveness of neural and wavelet networks for insurer credit rating

    based on publicly available financial data. IEA/AIE 2003, 527536.

    Robinson, P.M., Yajima, Y., 2002. Determination of cointegrating rank in fractional systems. Journal of Econometrics 106,

    217241.

    Sadorsky, P., 2006. Modeling and forecasting petroleum futures volatility. Energy Economics 28, 467488.

    Sfetsos, A., Coonick, A.H., 2000. Univariate and multivariate forecasting of hourly solar radiation with artificial

    intelligence techniques. Solar Energy 68 (2), 169178.

    Singh, S., 1999a. Noise impact on time-series forecasting using an intelligent pattern matching technique. Pattern

    Recognition 32, 13891398.

    Singh, S., 1999b. A long memory pattern modeling and recognition system for financial time-series forecasting. Pattern

    Analysis & Applications 2, 264273.Singh, S., 2001. Multiple forecasting using local approximation. Pattern Recognition 34, 443455.

    Singh, S., Fieldsend, Jonathan, 2001. Pattern matching and neural networks based hybrid forecasting system. ICAPR

    2001, 7282.

    Stefansson, A., Koncar, N., Jones, A.J., 1997. A note of the gamma test. Neural Computing & Applications 5, 131 133.

    903Y. Fan et al. / Energy Economics 30 (2008) 889904

  • 8/7/2019 A Generalized Pattern Matching Approach for Multi-step

    16/16

    Tang, L., Hammoudeh, S., 2002. An empirical exploration of the world oil price under the target zone model. Energy

    Economics 24, 577596.

    Weiss, S.M., Kulikowski, C.A., 1991. Computer Systems That Learn. Morgan Kaufmann.

    Wilson, I.D., Paris, S.D., Ware, J.A., Jenkins, D.H., 2002. Residential property price time series forecasting with neural

    networks. Knowledge-Based Systems 15, 335341.Ye, M., Zyren, J., Shore, J., 2002. Forecasting crude oil spot price using OECD petroleum inventory levels. International

    Advances in Economic Research 8, 324334.

    Ye, M., Zyren, J., Shore, J., 2005. A monthly crude oil spot price forecasting model using relative inventories. International

    Journal of Forecasting 21, 491501.

    Ye, M., Zyren, J., Shore, J., 2006a. Forecasting short-run crude oil price using high- and low-inventory variables. Energy

    Policy 34, 27362743.

    Ye, M., Zyren, J., Shore, J., 2006b. Short-run crude oil price and surplus production capacity. International Advances in

    Economic Research 12, 390394.

    Yousefi, S., Weinreich, I., Reinarz, D., 2005. Wavelet-based prediction of oil prices. Chaos, Solitons and Fractals 25,

    265275.

    Zhang, G., Patuwo, B.E., Hu, M.Y., 1998. Forecasting with artificial neural networks: the state of the art. International

    Journal of Forecasting 14, 3562.

    904 Y. Fan et al. / Energy Economics 30 (2008) 889904