a generalized pattern matching approach for multi-step
TRANSCRIPT
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
1/16
A generalized pattern matching approach for multi-stepprediction of crude oil price
Ying Fan a,1, Qiang Liang a,b,2, Yi-Ming Wei a,,3
a Center for Energy and Environmental Policy Research, Institute of Policy and Management,
Chinese Academy of Sciences, Beijing, 100080, Chinab Graduate University, Chinese Academy of Sciences, Beijing, 100080, China
Received 10 November 2005; received in revised form 22 October 2006; accepted 22 October 2006
Available online 5 December 2006
Abstract
This paper applies pattern matching technique to multi-step prediction of crude oil prices and proposes a new
approach: generalized pattern matching based on genetic algorithm (GPMGA), which can be used to forecast
future crude oil price based on historical observations. This approach can detect the most similar pattern incontemporary crude oil prices from the historical data. Based on the similar historical pattern, a multi-step
prediction of future crude oil prices can be figured out. In GPMGA modeling process, the traditional pattern
matching is not directly employed. Historical data is transformed to larger or smaller scales in thex-axis and the
y-axis directions, so that a generalized price pattern reflecting current price movement can be obtained. This
treatment overcomes the local deficiency of the traditional pattern modeling in recognition system approach
(PMRS), and in addition to this, a matched historical pattern in a larger pattern size can be found. Since the
approach takes not only historical similarities but also differences into account, the concept of generalized
pattern matching is proposed here. It proves a new basis for multi-step prediction by finding out more essential
similarities through various transformations. The related empirical study is constructed for a one-month
forecasting of the Brent and WTI crude oil prices, and satisfying forecasting results are attained. At the end,
Available online at www.sciencedirect.com
Energy Economics 30 (2008) 889904
www.elsevier.com/locate/eneco
Grant sponsors: National Natural Science Foundation of China under grant Nos. 70425001, 70573104 and 70371064,
and the Key Projects of National Science and Technology of China (2001-BA608B-15, 2001-BA605-01). Corresponding author. Institute of Policy and Management (IPM), Chinese Academy of Sciences (CAS), P.O. Box 8712,
Beijing 100080, China. Tel./fax: +86 10 62650861.
E-mail address: [email protected](Y.-M. Wei).1 Dr. Ying Fan is a Professor at the Institute of Policy and Management, Chinese Academy of Sciences, China. Her
research field is energy policy and system engineering. In 2004, she was a visiting scholar at Cornell University, USA.2 Mr. Qiang Liang is a Ph.D. candidate in Management Science at the Institute of Policy and Management, Chinese
Academy of Sciences, China.3 Dr. Yi-Ming Wei is a Professor at the Institute of Policy and Management of the Chinese Academy of Sciences, China.
He was a visiting scholar at Harvard University in the United States in 2005.
0140-9883/$ - see front matter 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.eneco.2006.10.012
mailto:[email protected]://dx.doi.org/10.1016/j.eneco.2006.10.012http://dx.doi.org/10.1016/j.eneco.2006.10.012mailto:[email protected] -
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
2/16
comparisons with some other time series prediction approaches, such as PMRS and Elman network,
demonstrate the effectiveness and superiority of GPMGA over others.
2006 Elsevier B.V. All rights reserved.
Keywords: Pattern matching; Genetic algorithm; Crude oil price; Multi-step prediction
1. Introduction
Crude oil, sometimes called the blood of industries, plays an important role in any
economies. Oil price, as one of the main focal point in many countries, becomes an
increasingly essential topic of concern to governments, enterprises and investors. Influenced
by many complicated factors, oil prices appear highly nonlinear and even chaotic as Panas and
Ninni (2000) and Adrangi et al. (2001) pointed out, which makes it rather difficult to forecastthe future oil prices especially in multi-step prediction. Nevertheless, multifarious oil price
forecast approaches are projected, based on which many decisions with regard to oil prices
have to be made.
Oil price forecast approaches basically involve two classes, single-factor time series models
and multi-factor models. The first one considers time as an independent variable and builds up
mathematical models based on the oil price time series so as to produce future predictions; while
the latter also takes into account the main influential factors of oil prices such as GDP, supply and
demand into consideration, and multi-variable models are often constructed to forecast future oil
prices. In comparison with multi-factor models which have to foretell future values of influencing
factors before predicting oil prices, single-factor models appear to have the advantage of avoidinguncertainty involved in predicting other related variables.
There is substantial literature on time series methods related to oil price prediction, but most
of them only deal with single-step forecasting. Dominguez (1989), Crowder and Hamed
(1994), Moosa and Al-Loughani (1994) and Gulen (1998) mention that one-month forward
price is a remarkable indicator for short-term predictions of oil prices, and a few researches
indicate that oil prices takes on evident GARCH properties. A semi-parametric approach based
on VAR prediction approach is suggested to obtain a forecast of the entire density function of
the price of an asset (Barone-Adesi et al., 1998). The forecast ability of one-month forward as
an indicator of short-term oil prices is considered again by Claudio (2001), and the approach
proposed by Barone-Adesi et al. (1998) is adopted to perform a short-term forecast of Brent oilprice. Belief Network is introduced by Abramson and Finizza (1991), Abramson (1994) and
Monte Carlo Analysis is used to predict the crude oil price. Afterwards, the combination of
Belief Network and Probabilistic Model presented by Abramson and Finizza (1995) is applied
to take a probabilistic forecast in oil markets. Kaboudan (2001) uses compumetric methods to
perform short-term monthly forecasts of crude oil prices and suggests that genetic
programming has advantage over random walk predictions while the neural network forecast
proved inferior. Tang and Hammoudeh (2002) find that the nonlinear model based on the
Target Zone Theory can greatly improve the oil price forecasting ability. Ye et al. (2002, 2005,
2006a,b)) presents a short-term forecasting model of monthly West Texas Intermediate crude
oil spot prices using OECD petroleum inventory levels. Yousefi et al. (2005) introduces a
wavelet-based prediction procedure and market data on crude oil is used to provide forecasts
over different forecasting horizons. Sadorsky (2006) uses several different univariate and
multivariate statistical models such as TGARCH and GARCH to estimate forecasts of daily
890 Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
3/16
volatility in petroleum futures price returns. Ye et al. (2006a,b) shows the effect that surplus
crude oil production capacity has on short-term crude oil prices.
Recent years, a variety of new theories like neural network, Markov models and generalized
system are applied to the financial prediction (Prigmore and Long, 2003; Kodogiannis andLolis, 2002; Fong and See, 2002). Peters (1994) points out that most financial markets are not
Gaussian distributed, but tend to have sharper peaks and fat tails, a phenomenon well known
currently in practice. Under such evidence, a lot of traditional methods based on Gaussian
normal assumption have their shortage in making better forecasts. One of the key findings
explained by Peters (1994) is that most financial markets have a long memory; what happens
today affects the future forever. In other words, current data are correlated with all past data to
varying extents. The long-memory component of the market cannot be adequately explained
by a system that works with short-memory parameters. Therefore, prediction approaches based
on historical pattern matching should be chosen for long-memory systems. Fractal Market
Hypothesis also offers sturdy support for the feasibility of historical pattern matchingapproaches. Peters (1994) provided extensive evidence supporting his claim that markets have
a fractal structure based on different investment horizons, and they cover longer distance than
the square root of time, i.e. they are not random in nature. Instead, they consist of non-
periodical cycles which are hard to detect and use in statistical or neural forecasting. He argues
that financial markets are predictable, but accurate predicting algorithms need long memories.
Farmer and Sidorowich (1988) find that chaotic time series prediction using local
approximation techniques is much better than global approximations. Local approximation
refers to the idea of breaking up the time domain into small neighborhood regions and
analyzing them separately. As an empirical study, the combination of genetic algorithm, neural
network and chart pattern recognition is employed to forecast stock prices (Leigh et al., 2000).The integration of chart pattern recognition and wavelet radial basis function network is also
applied for the same task (Liu et al., 2004). The nearest neighbor approach presented by
Farmer and Sidorowich (1988) and the pattern imitation techniques presented by Motnikar
et al. (1996) are used for time series prediction. Recently, the pattern modeling and recognition
system (PMRS) presented by Singh (1999a,b; 2001), Singh and Fieldsend (2001) is adopted
for financial time series forecast. The successful application of local approximation approaches
such as chart pattern recognition, nearest neighborhood approaches, pattern imitation
techniques, PMRS and so on indicates that local approximation approaches are quite effective
for the prediction of nonlinear time series.
Oil price time series is a nonlinear long-memory series (Alvarez-Ramirez et al., 2002, 2003;Robinson and Yajima, 2002; Bernabe et al., 2004; Gil-Alana, 2001). As such, we can improve
forecasting ability for oil price time series prediction by using local approximation approaches.
In this paper, a new local approximation approach based on genetic algorithm and generalized
pattern match is proposed, which is implemented for the prediction of crude oil price time series.
The empirical study is constructed for a one-month forecasting of the Brent and WTI crude oil
prices respectively, and the satisfying forecasting results are achieved. Finally, the comparison
between GPMGA and some other time series prediction approaches, such as PMRS and Elman
neural network, demonstrates the effectiveness and superiority of GPMGA.
This paper is organized as follows: in Section 2, the generalized pattern matching approach is
proposed after the pattern modeling and recognition system is introduced, and the generalized
pattern matching approach based on genetic algorithm is described; in Section 3, the Elman
network is employed as a comparative approach; in Section 4, an empirical study is conducted to
show the precision of GPMGA; and the last section is the conclusion.
891Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
4/16
2. Generalized pattern matching based on genetic algorithm
In this section, the pattern modeling and recognition system is introduced; then generalized
pattern matching approach is proposed; a parameter optimization based on the genetic algorithmis described as well.
2.1. Pattern modeling and recognition system (PMRS)
Denote the time series as y = {y1,y2,, yn}, where n is its length. The past state sp= {yj k+1,
yj k+ 2,yj} most similar to the current statesc= {yn k+1,yn k+2,yn} can be searched among the
historical data of the time series by using pattern matching approaches, where kis the size of the
state, state sp is so called the nearest neighbor of state sc. Then the past state {yj+1,yj+2,yj+ m} is
used to forecast the future state {yn+1,yn+ 2,yn+m}, where m is the prediction length. A segment
of the time series y is defined as = (i,i+1,,i+ k1), where kis the size of segment, 1 i i +k1n1, j=yj+1yj, ij i + k1. The offset between two states {yp,yp+1,yp+ k} and
{yq,yq+1,yq+ k} is denoted as j Pk
i1 wi dpi1dqi1
; 1pp + kn, 1qq + kn,
where wi is the weight, 1 ik. A pattern is defined as = (bi,bi+1,bi+ k1), where k is the
size of pattern , 1 i i + k1n1, bj 1;yj1zyj0;yj1byj
;
&iji + k1, bjis so called tag, i + k1
is called marker position. The main point of PMRS is to model the current pattern of a time series by
directly matching its current pattern with past pattern, and then a forecast can be made according to
the pattern following the most similar past pattern (Singh et al., 1999a,b; Singh, 2001; Singh and
Fieldsend, 2001).
2.2. Generalized pattern matching (GPM)
PMRS can be considered as a direct pattern matching method, since the current pattern and
the past pattern are matched without any transformation. Historical data being directly searched
to match the past state most similar to the current one implies that current oil prices are expected
to change exactly according the historical rules. However, the complexity of oil price
movements often makes this kind of direct matching to be inaccurate. On one hand, historical
rules do not always appear in the same way. For example, oil prices had risen by 2 dollars in the
past two months, and following oil prices appeared to show a similar movement but with more
rapidly rising speed, i.e. it took only 1 month to rise by 2 dollars. These phenomena may becaused by the differences of the time and market conditions, although the rule behind it may be
similar. In other words, historical rules may have only the similarity other than the exact right. On
the other hand, small pattern size is usually used for a strong local search in the PMRS method,
which can ensure that current pattern and the past pattern match well only in relatively short
periods, while seriously deviating in longer pattern sizes. Thus, when performing a multi-step
prediction, it is hard to find similar status in the historical time series by using PMRS. Here, a
generalized pattern matching approach (GPM) is proposed in order to improve PMRS method in
these two aspects.
2.2.1. The rational of GPM
Denote a state {yi,yi+1,yi+ k1} of the time series as s(i,k), 1 i i + k1n, where i, so
called the beginning position of state s, indicates the position of the first element of state s in the
time series y, kis the size of state s, namely, the number of the elements in state s. Obviously the
892 Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
5/16
subsequent state of state s is the state whose beginning position is just behind the last element of
state s. For instance, the subsequent state of state s(i,k) = {yi,yi+1,,yi+ k1} is s(i + k,k) = {yi+ k,
yi+ k1,yi+ k+ k1}, where k is the size of s.
States in the time series are directly used to implement pattern matching in PMRS, whilein the generalized pattern matching, states are imposed appropriate scale transform both in
the x-axis and the y-axis directions beforehand, equivalent to match states after historical
rules of oil price movements are appropriately adjusted. Suppose the series (x1,x2,xM) is
obtained from state s(i,k) through a transformation, a pattern is defined as (i,M,,), where
i and M are the starting position and size of pattern , and are respectively called the
scale factors in the x-axis and the y-axis directions of pattern , the state s(i,k) before
transform is called the original state of pattern . This procedure is described in detail as
follows.
The original state is transformed in the x-axis direction by the factor. In general, suppose
(0,+), the original state s(i,k) is disposed through linear interpolation, and then pattern is attained after a scale transform in the x-axis direction. Denote t j1
a; t= t , pattern and
its original state s satisfies following equations:
M kaa 1b c 1
xj yiPt tPt
yi
Ptyi
Pt1
2
where 1jM.
Likewise, the original state is transformed in the y-axis direction by the factor . In general,suppose (0,+), the original state s(i,k) is simply multiplied by . Pattern and its original
state s satisfy following equations:
M k 3
xj byij1 1VjVk: 4
The transform factors and represent the differences between the current price
fluctuation and the similar historical phenomenon. reflects the relation between current price movements and the past in the time scale given a certain price change. If =1, the
current pattern takes on the same scale as the searched historical one, i.e., it takes the same
time to realize the given price change both at present and in the past. If b1, the current
pattern is what the historical one scales down to, i.e., it takes a shorter period to fulfill the
price change at present than in the past. IfN1, the current pattern is what the historical one
scales up to, i.e., it takes a longer period to achieve the price change at present than in the
past. Similarly, reflects the relation between current price movements and the past in the
price scale given a certain period. If=1, the current pattern manifests the same scale as the
searched historical one, i.e., there is a same price change during the given period both at
present and in the past. IfN1, the current pattern is what the historical one scales up to, i.e.,
larger price change appears during the period at present than in the past. If b1, the current
pattern is what the historical one scales down to, i.e., smaller price change takes place during
the period at present than in the past.
893Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
6/16
Unlike PMRS, which matches the current state and the past one directly, GPM takes the
differences between the current state and the past one in addition to their similarities into
consideration, and the past one is scaled both in the x-axis and the y-axis directions to match the
current one indirectly, so it is suitable to be called
generalized pattern matching
. When GPM isadopted, the satisfying past pattern most similar to the current one could be found even if a large
pattern size is required. Since the similarity between the current and the past patterns in
generalized sense can be kept in a long period, a more satisfying result can be obtained when
performing a multi-step prediction.
2.2.2. The modeling process of GPM
The same as PMRS, the pattern structure in GPM is determined by the pattern size. For time
series y, Suppose current time is n, it is expected to get a m-step prediction. Denote actual values
in the prediction period are {yn+1,yn+2,yn+m}; predicted values are {n+1,n+2,n+m}.
For a certain k, current state sc(n i + 1,k) is taken as current pattern directly, i.e.
qc ni 1; k; 1; 1 sc ni 1; k 5
The past pattern h(i,k,,), which is most similar to the current pattern c can be calculated
by genetic algorithm explained in next section. Correspondingly, the original state of the past
pattern is sh(i,k). Since the scale transform in the y-axis direction doesn't change the pattern size,
the relation between k and k can be received from Eq. (1) as follows:
k kVaa 1b c 6
Therefore, kk+ 1 b k+1, that is:
k a1
aV kVb
k a
a7
The minimal k being up to Eq. (6) should be taken as the size of the original state sh i.e.
kVk a1
a
$ %8
So the original state of pattern h(i,k,,) is sh i;ka1
a
: Subsequently, the beginning position
of the consequent state sh is i ka1
a
: Since q Vh i
ka1a
; m; a; b
can be obtained from sh
through transform, the size ofsh should bema1
a
: Pattern qVh i
ka1a
; m; a; b
can be attained
from state s Vh i ka1
a
;
ma1a
though the same transform as the one for pattern h.
On the assumption that there exists local similarities in the time series according to many
local approximation approaches, namely there are always similar periods in history, if the past
pattern h generated from the past state sh through scale transforms can best match the current
pattern c, then the pattern h generated from the consequent state sh is rationally expected to
match the future pattern f. So f can be directly derived from h. In the mean time, f is
generated from the future state sf without any transform just like c is not transformed either in
894 Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
7/16
the x-axis or the y-axis direction, therefore the future state sf can be derived from f directly,
i.e. the result of m-step prediction is:
yn1; yn2
;: : :
;
ynm
sf n 1; m qf n 1; m; 1; 1
qhV
i k a1
a
$ %; m; a; b
:
To quantify the difference between two patterns: 1(i1,M,1,1) = {x11,x2
1,,xM1 } and 2(i2,M,2,
2) = {x12,x2
2,,xM2 } with the same pattern size in the pattern matching process, an offset
between patterns 1 and 2 is defined as follows:
j ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1M
XMj1
x1jx2j
2
vuut: 10
To sum up, the modeling process of GPM can be described as follows. For each possible k, the
steps below should be gone through:
(1) Search the historical pattern h(i,k,,) most similar to the current pattern c(nk+ 1,k,1,1)
whose original state is s(nk+ 1,k) = {yn k+1,yn k+2,yn} by minimizing the offset
between h and c;
(2) Figure out the original state sh of pattern h and its subsequent state sh accordingly;(3) Apply the same transform as the one for the best matched historical pattern h(i,k,,) to sh,
and compute out the prediction {n+1,n+2,n+m};
(4) Calculate the prediction error;
(5) Choose the optimal k which minimizes the predicted error.
Illustrated as above procedure, the optimal pattern size k relies on the enumerating approach,
but the difference from PMRS is that a largerkdoesn't lead to a great departure between searched
past patterns and the current pattern, and does make past patterns and the current pattern matched
better in a longer period. It implies that GPM can more accurately reflect historical rules of oil
price movement.To search for the past pattern h most similar to the current pattern c, following optimization
problem should be solved: for a given pattern size k, find the optimal parameters i, and by
minimizing the offset between the two pattern c(nk+ 1,k,1,1) and h(i,k,,). This question
can be solved effectively by means of a genetic algorithm presented below.
2.2.3. Generalized pattern matching based on genetic algorithm (GPMGA)
Genetic algorithm as one of global optimizing methods simulates the evolving process of the
life form in nature. Individuals of one generation exchange information with other individuals
through genetic operators such as selection and crossover, so that a new better generation is
obtained. Keeping the process iteratively, an optimal solution can be figured out. Genetic
algorithm, which can avoid local optimization in the searching process, holds advantage over the
common local search methods. The combination of genetic algorithm and other methods plays a
more and more important role in the forecasting field. Leigh et al. (2000) integrate genetic
895Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
8/16
algorithm, neural network and chart pattern recognition to predict stock prices, Nunnari (2004)
adopts wavelet function approximation based on genetic algorithm to forecast the air pollution
time series, both of which attained good results.
In this paper, genetic algorithm is employed to search the optimal solution of aboveoptimization problem. Suppose a pattern, (i,k,,), represents an individual, whose chromosome
must involve the information of three parameters: i, and . Thus the decimal strings of these
three parameters are assigned as the gene group of a chromosome.
In the process of evolution, the fitness function, which is used to evaluate whether a individual
is good enough, is defined as follows:
g q 1
gj11
where is a constant, is the offset between the past pattern h and the current pattern c.
To keep the evolution towards more optimal generations with time going on, the classicalroulette wheel approach combined with the elite strategy approach is implemented. During
roulette wheel selection, two mates are selected for reproduction with a certain probability, which
is in proportion to their fitness values.
Two-point-crossover approach is implemented as the crossover operator. The substrings
defined by the chosen two random points in the selected pair of strings are exchanged with a
certain probability.
Two-element swap mutation and self-adaptive mutation are implemented. The mutation
temperature of an individual j is defined as follows (He et al., 2002):
Tj U 0; 1 1g qj
PNl1
g ql
0BBB@1CCCA 12
where U(0,1) is the uniform distribution over the range [0,1].
Let j respectively represents one of the parameters i, and , j= 1,2,3, then the new
parameter, j, of the new individual after self-adaptive mutation can be defined as:
hjV
hj rTjN 0; 1 13
where the constant [0,1] is called the severity coefficient, and N(0,1) is the standard normal
distribution.
3. Elman network
The recurrent network developed by Elman has a simple architecture, and it can be trained
using the standard BP learning algorithm. The context units of the Elman network memorize
some past states of the hidden units, so the output of the network depends upon an aggregate of
the previous states and the current input. In this architecture, in addition to the input, hidden and
output units, there are also context units. The input and output units interact with the outside
environment, while the hidden and context units do not. The input units are only buffer units that
pass the signals without changing them. The output units are linear units which sum the signals
fed to them. The hidden units have nonlinear sigmoidal functions. The feedforward connections
896 Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
9/16
are modifiable, but the recurrent are fixed. This network has been proved to be effective and was
applied as a benchmark model in the research of Kermanshahi and Iwamiya (2002) and
Kodogiannis and Lolis (2002).
3.1. Data standardization
Before the neural network modeling, a preprocessing of data by an appropriate transform can
greatly improve the modeling result. One of the two common transforms is that which makes thetransformed data value limited in a certain interval, another is that the mean and the variance of
the transformed data are made in a certain interval. The latter one is employed here, by which the
mean and the variance are 0 and 1. Denote an original observation as yt, with mean yand variance
Dy, then the transformed data y can be expressed as follows:
ytVyt
Py
Dy: 14
3.2. Related parameters
3.2.1. Input layer
The number of nodes in the input layer represents the length of the window, or the number of
lagged observations used to discover the underlying pattern in a time series. This is the most
crucial variable for a forecasting problem, since the vector contains important information about
complex structure in the data. However, there is no widely accepted systematic way to determine
the optimum length for an input vector (Zhang and Patuwo, 1998), therefore the Gamma test,
autocorrelation and the ARMA model are used to analyze the data, and then the most appropriate
node number of the input layer is decided accordingly (Stefansson et al., 1997; Sfetsos and
Coonick, 2000.
3.2.2. Hidden layers
The number of the hidden nodes should make the training error as small as possible and the
network architecture simplest, namely with as few hidden nodes as possible (Weiss and
Fig. 1. The Brent crude oil prices (from 5/20/1987 to 7/26/2005).
897Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
10/16
Kulikowski, 1991). To enhance the network's performance, two hidden layers are considered.
Denoting the number of the hidden nodes as S1 and S2 respectively in each hidden layer, which
are the optimal values taken as the number of the hidden nodes by minimizing the training errors.
3.2.3. Output layer
There usually are two forecasting approaches for neural network, the iterative and the direct
approaches (Wilson et al., 2002). The first approach requires only a single output node, withforecast values being substituted into the input vector to make further predictions, while the
second one provides multiple output nodes that correspond to the forecasting horizon. In order to
decrease the number of output nodes and fulfill a better result, we employ the first method.
4. Empirical study
4.1. Data
Daily Brent and WTI crude oil prices, received from IEA website, are used as an empirical
study. The price unit is dollar/barrel. To deal with a few missing data, a linear interpolation isperformed on the original data. In this study, there are 4681 data for Brent daily oil price dating
from 5/20/1987 to 7/26/2005, illustrated as Fig. 1.
Fig. 1 shows that the Brent oil price takes on violent local fluctuation. The fluctuation is kept
even in a long period. So does the WTI prices. The WTI data consists of 4933 groups of daily
observations from 1/2/1986 to 7/26/2005.
To test the forecasting ability of GPMGA, PMRS and Elman network are employed as
comparative approaches.
All data are divided into three parts when PMRS, Elman network and GPMGA are employed
to make a multi-step prediction, shown as Fig. 2.
The modeling data are used for in-sample computation of the parameters of PMRS, Elmannetwork and GPMGA. The evaluating data are used for selection of the best parameters of these
models. When a group of model parameters is calculated according to the modeling data, a
prediction of the evaluating data is obtained and the forecasting results are compared with the
Fig. 2. Data classification when modeling.
Table 1
The division of experimental data
Data Parts Period Length
Brent Modeling data From 5/20/1987 to 4/22/2005 4614
Evaluating data From 4/25/2005 to 6/24/2005 45Testing data From 6/27/2005 to 7/26/2005 22
WTI Modeling data From 1/2/1986 to 4/22/2005 4960
Evaluating data From 4/25/2005 to 6/24/2005 45
Testing data From 6/27/2005 to 7/26/2005 22
898 Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
11/16
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
12/16
4.3.2. Evaluation of forecasting results
For Brent data, the forecasting results received from PMRS method, Elman network and
FPMGA method are illustrated in Fig. 3.
Fig. 3 shows that the forecasting result of PMRS is basically close to the actual prices in view
of trend, but not in most prices. And the forecasting curve of PMRS is too smooth to exhibit
violent fluctuation of oil prices within the short period. On the contrary, the forecasting result of
Elman neural network, which moves downward or upward dramatically, indicates that it is a
powerful nonlinear tool. Although rather exact predictions are produced by Elman network, theforecasting results of neural network have extreme fluctuations, far more than actual prices.
In the obtained results of GPMGA, the predicted values tally with the actual prices not only at
quite a few points but also the whole trend. The predicted curve is quite close to the actual curve
of Brent prices at almost every point, which shows the best predictive ability among the others.
For WTI data, the forecasting results of the three models are shown in Fig. 4.
Fig. 4 shows that the predicted values received from PMRS model obviously deviate from the
actual values in the late period, which is caused by the limitation of the small pattern size
Fig. 4. Forecasting results of the three models for WTI data (From 6/27/2005 to 7/26/2005).
Fig. 3. Forecasting results of the three models for Brent data (From 6/27/2005 to 7/26/2005).
900 Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
13/16
determined by PMRS, though the forecasting result in the early period is very close to the actual
price. While, Elman network behaves well at only a few points, some of which even overlap the
predicted and actual value. However it poorly behaves at most other points.
Different from the above two models, the fluctuating rules are accurately mastered by GPMGA
and a better result is produced out only in the locality but also in the whole prediction period.
The Root Mean Square Error (RMSE) and the Mean Absolute Percentage Error (MAPE) ofeach model are shown in Table 4.
The empirical study, a 21-step prediction of Brent and WTI oil prices (a month roughly),
illustrates the predictive ability of three models. The prediction errors, MAPE, in these models are
all below 8%. Among them, the MAPE of GPMGA model is around 2% both in Brent and WTI,
which is much better than the PMRS and Elman network. This result clearly shows the
effectiveness in dealing with multi-step predictions of oil prices.
The oil price trend in the early period before 6/24/2005 appears as a rapid rising trend, even
though a decreasing trend right after can be accurately predicted by GPMGA model, which
implies the GPMGA can somewhat more accurately forecast the position of inflection points
referring to historical information. Therefore, GPMGA avoids the disadvantage of linearprediction, which tends to tell people that future prices will keep rising from the rising trend of
earlier prices. Strong evidence which shows that GPMGA can capture the nonlinear
characteristics of price movements in oil time series is obtained from this empirical study.
However, GPMGA takes much longer time to implement the modeling process. For our case,
about 97 h was spent.
4.4. Discussion
The fluctuation of oil prices is often related with macro situation and key events in the world. In
general, political events, military conflicts, serious climate abnormalities, catastrophes andaccidents in important oil-producing areas would lead to sharp changes in oil prices. Therefore, any
long-term predictions should be based on them. However, reality is so much more complicated that
it is hard to capture the similarity from all events or scenarios that may affect oil prices.
As shown above, GPMGA model finds the similarity among patterns: starting from Sep. 2003
and Jun. 2005, and gives a somewhat satisfying forecast in Jul. 2005 accordingly. So let's explore
the events behind these patterns.
In Sep. 2003, the events behind oil price rapidly rising are: i) the deteriorating situation in Iraq
after war and the problematic postwar reconstruction. There were fears that frustration in the Middle
East region might aggravate further leading to shortage of oil supply; ii) the world economy appear to
rebound and recover. With economic recovery in most countries, oil demand increased by 7%
compared to before 2003, but oil production only increased by lower than 4% during this period; iii)
OPEC adopted the strategy of limiting oil production to maintain prices, cutting oil production even
though demand increased; iv) World oil stocks reached their lowest levels in these 10 years. This also
Table 4
Predicted errors of three models
Market Methods Elman network PMRS GPMGA
Brent RMSE 2.1661 1.9618 1.7746MAPE(%) 3.23 2.93 2.43
WTI RMSE 1.7733 1.5662 1.0909
MAPE(%) 2.59 2.24 1.57
901Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
14/16
included the US, whose stocks at the end of 2003 was 12% less than the same period in 2002,
amounting to only 0.27 billion barrels, the lowest figure within 20 years.
In June 2005, the influential factors behind rapidly rising oil prices are i) the rate of recovery in
the world economy is still high over an estimated above 3%, and the sustained strongadvancement of the US economy, contributing to greater oil demand; ii) The capacity of oil
production and refining up to their limits with no easily-attainable ways to greatly increase oil
production in order to meet the rising demand; iii) US entering into the summer season in which
their demand will crest, with expectation from the public tends toward future oil price increases;
iv) the persisting chaotic circumstances in the Middle East region and US uncertainties in Iraq; v)
The arrival of the hurricane season in summer and its serious impact on the coastal oil-producing
zones such as Mexico bay in US. For e.g. Hurricane Ivan caused heavy losses last year while
hurricanes may cause even worse damages this year; vi) The oil stock of US decreasing from
329 million bbl to 324.9 million bbl in the period of 10, June to 1, July.
5. Conclusions
The pattern matching is introduced to the prediction field of oil price in this paper and a new
model, the generalized pattern matching based on genetic algorithm (GPMGA), is proposed to
conduct multi-step forecasts of oil prices. In the GPMGA model, the past pattern most similar to
the current pattern is searched from historical observations to predict future prices according to
the historical rules represented by the matched past pattern. GPMGA overcomes some of defects
of PMRS and Elman network in the prediction of long-memory time series.
The empirical study for Brent and WTI crude oil prices illustrates the effectiveness of GPMGA.
In this study, useful historical information is found by GPMGA, which have good global searchcapabilities, so that the best matched past pattern can be found both rapidly and accurately.
GPMGA is not only a powerful model that can implement a better multi-step prediction, but also an
important tool for information mining. By using it, the rules of oil price fluctuation related to macro
situation and key events can be investigated through similarities between past and current oil price
movements, highlighting the GPMGA as a practical analytic tool for detecting oil price movements.
Acknowledgements
The authors gratefully acknowledge the financial support from the National Natural Science
Foundation of China (NSFC) under the grants Nos. 70425001, 70573104 and 70371064, the KeyProjects from the Ministry of Science and Technology of China (grants 2001-BA608B-15, 2001-
BA60501). We also would like to thank Professor R.S.J. Tol and the anonymous referees for their
helpful suggestions and corrections on the earlier draft of our paper according to which we improved
the content.
References
Abramson, B., 1994. The design of belief network-based systems for price forecasting. Computers & Electrical
Engineering 20, 163180.
Abramson, B., Finizza, A., 1991. Using belief networks to forecast oil prices. International Journal of Forecasting 7, 299315.Abramson, B., Finizza, A., 1995. Probabilistic forecasts from probabilistic models: a case study in the oil market.
International Journal of Forecasting 11, 6372.
Adrangi, B., Chatrath, A., Dhanda, K.K., Raffiee, K., 2001. Chaos in oil prices? Evidence from futures markets. Energy
Economics 23, 405425.
902 Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
15/16
Alvarez-Ramirez, J., Cisneros, M., Ibarra-Valdez, C., Soriano, A., 2002. Multifractal Hurst analysis of crude oil prices.
Physica A 313, 651670.
Alvarez-Ramirez, J., Soriano, A., Cisneros, M., Suarez, R., 2003. Symmetry/anti-symmetry phase transitions in crude oil
markets. Physica A 322, 583596.
Barone-Adesi, G., Bourgoin, F., Giannopoulos, K., 1998. Don't look back. Risk August, pp. 100103.Bernabe, A., Martina, E., Alvarez-Ramirez, J., Ibarra-Valdez, C., 2004. A multi-model approach for describing crude oil
price dynamics. Physica A 338, 567584.
Box, G., Jenkins, G.M., Reinsel, G., 1994. Time Series Analysis: Forecasting and ControlThird edition. Prentice Hall.
Claudio, M., 2001. A semiparametric approach to short-term oil price forecasting. Energy Economics 23, 325338.
Crowder, W., Hamed, A., 1994. A cointegration test for oil futures market efficiency. Journal of Futures Markets 13 (8),
933941.
Dominguez, K.M., 1989. The volatility and efficiency of crude oil futures contracts, ch.2. In: Dominguez, K.M., Strong, J.S.,
Weiner, R.J. (Eds.), Oil and money: Coping with price risk through financial markets. Harvard International Energy
Studies, pp. 4897.
Farmer, J.D., Sidorowich, J.J., 1988. Predicting chaotic dynamics. In: Kelso, J.A.S., Mandell, A.J., Shlesinger, M.F. (Eds.),
Dynamic Patterns in Complex Systems. World Scientific, Singapore, pp. 265292.
Fong, W.M., See, K.H., 2002. A Markov switching model of the conditional volatility of crude oil futures prices. EnergyEconomics 24, 7195.
Gil-Alana, L.A., 2001. A fractionally integrated model with a mean shift for the US and the UK real oil prices. Economic
Modelling 18, 643658.
Gourieroux, C., 1997. ARCH Models and Financial Applications. Springer-Verlag.
Gulen, S.G., 1998. Efficiency in the crude oil futures markets. Journal of Energy Finance & Development 3 (1), 13 21.
He, Y., Chu, F., Zhong, B., 2002. A hierarchical evolutionary algorithm for constructing and training wavelet networks.
Journal of Energy Finance & Development 10, 357366.
Kaboudan, M.A., 2001. Compumetric forecasting of crude oil prices. Proceedings of the 2001 Congress on Evolutionary
Computation, vol. 1, pp. 283287.
Kermanshahi, B., Iwamiya, H., 2002. Up to year 2020 load forecasting using neural nets. Electrical Power and Energy
Systems 24, 789797.
Kodogiannis, V.S., Lolis, A., 2002. Forecasting financial time series using neural network and generalized system-based
techniques. Neural Computing & Applications 11, 90102.
Leigh, W., Odisho, E., Paz, N., Paz, M., 2000. Progress report: improving the stock price forecasting performance of the
bull flag heuristic with genetic algorithms and neural networks. IEA/AIE 2000, 617622.
Liu, J.N.K., Kwong, R.W.M., Bo, F., 2004. Chart patterns recognition and forecast using wavelet and radial basis function
network. KES 2004, 564571.
Moosa, I.A., Al-Loughani, N.E., 1994. Unbiasedness and time varying risk premia in the crude oil futures markets. Energy
Economics 16 (2), 99105.
Motnikar, B.S., Pisanski, T., Cepar, D., 1996. Time-series forecasting by pattern imitation. OR-Spektrum 18 (1), 4349.
Nunnari, G., 2004. Modelling air pollution time-series by using wavelet functions and genetic algorithms. Soft Computing
8, 173178.
Panas, E., Ninni, V., 2000. Are oil markets chaotic? A non-linear dynamic analysis. Energy Economics 22, 549
568.Peters, E., 1994. Fractal Market Hypothesis: Applying Chaos Theory to Investment and Economics. Wiley.
Prigmore, M., Long, J.A., 2003. A comparison of the effectiveness of neural and wavelet networks for insurer credit rating
based on publicly available financial data. IEA/AIE 2003, 527536.
Robinson, P.M., Yajima, Y., 2002. Determination of cointegrating rank in fractional systems. Journal of Econometrics 106,
217241.
Sadorsky, P., 2006. Modeling and forecasting petroleum futures volatility. Energy Economics 28, 467488.
Sfetsos, A., Coonick, A.H., 2000. Univariate and multivariate forecasting of hourly solar radiation with artificial
intelligence techniques. Solar Energy 68 (2), 169178.
Singh, S., 1999a. Noise impact on time-series forecasting using an intelligent pattern matching technique. Pattern
Recognition 32, 13891398.
Singh, S., 1999b. A long memory pattern modeling and recognition system for financial time-series forecasting. Pattern
Analysis & Applications 2, 264273.Singh, S., 2001. Multiple forecasting using local approximation. Pattern Recognition 34, 443455.
Singh, S., Fieldsend, Jonathan, 2001. Pattern matching and neural networks based hybrid forecasting system. ICAPR
2001, 7282.
Stefansson, A., Koncar, N., Jones, A.J., 1997. A note of the gamma test. Neural Computing & Applications 5, 131 133.
903Y. Fan et al. / Energy Economics 30 (2008) 889904
-
8/7/2019 A Generalized Pattern Matching Approach for Multi-step
16/16
Tang, L., Hammoudeh, S., 2002. An empirical exploration of the world oil price under the target zone model. Energy
Economics 24, 577596.
Weiss, S.M., Kulikowski, C.A., 1991. Computer Systems That Learn. Morgan Kaufmann.
Wilson, I.D., Paris, S.D., Ware, J.A., Jenkins, D.H., 2002. Residential property price time series forecasting with neural
networks. Knowledge-Based Systems 15, 335341.Ye, M., Zyren, J., Shore, J., 2002. Forecasting crude oil spot price using OECD petroleum inventory levels. International
Advances in Economic Research 8, 324334.
Ye, M., Zyren, J., Shore, J., 2005. A monthly crude oil spot price forecasting model using relative inventories. International
Journal of Forecasting 21, 491501.
Ye, M., Zyren, J., Shore, J., 2006a. Forecasting short-run crude oil price using high- and low-inventory variables. Energy
Policy 34, 27362743.
Ye, M., Zyren, J., Shore, J., 2006b. Short-run crude oil price and surplus production capacity. International Advances in
Economic Research 12, 390394.
Yousefi, S., Weinreich, I., Reinarz, D., 2005. Wavelet-based prediction of oil prices. Chaos, Solitons and Fractals 25,
265275.
Zhang, G., Patuwo, B.E., Hu, M.Y., 1998. Forecasting with artificial neural networks: the state of the art. International
Journal of Forecasting 14, 3562.
904 Y. Fan et al. / Energy Economics 30 (2008) 889904