a generalized pattern matching approach for multi-step

8/7/2019 A Generalized Pattern Matching Approach for Multi-step

1/16

A generalized pattern matching approach for multi-stepprediction of crude oil price

Ying Fan a,1, Qiang Liang a,b,2, Yi-Ming Wei a,,3

a Center for Energy and Environmental Policy Research, Institute of Policy and Management,

Chinese Academy of Sciences, Beijing, 100080, Chinab Graduate University, Chinese Academy of Sciences, Beijing, 100080, China

Received 10 November 2005; received in revised form 22 October 2006; accepted 22 October 2006

Available online 5 December 2006

Abstract

This paper applies pattern matching technique to multi-step prediction of crude oil prices and proposes a new

approach: generalized pattern matching based on genetic algorithm (GPMGA), which can be used to forecast

future crude oil price based on historical observations. This approach can detect the most similar pattern incontemporary crude oil prices from the historical data. Based on the similar historical pattern, a multi-step

prediction of future crude oil prices can be figured out. In GPMGA modeling process, the traditional pattern

matching is not directly employed. Historical data is transformed to larger or smaller scales in thex-axis and the

y-axis directions, so that a generalized price pattern reflecting current price movement can be obtained. This

treatment overcomes the local deficiency of the traditional pattern modeling in recognition system approach

(PMRS), and in addition to this, a matched historical pattern in a larger pattern size can be found. Since the

approach takes not only historical similarities but also differences into account, the concept of generalized

pattern matching is proposed here. It proves a new basis for multi-step prediction by finding out more essential

similarities through various transformations. The related empirical study is constructed for a one-month

forecasting of the Brent and WTI crude oil prices, and satisfying forecasting results are attained. At the end,

Available online at www.sciencedirect.com

Energy Economics 30 (2008) 889904

www.elsevier.com/locate/eneco

Grant sponsors: National Natural Science Foundation of China under grant Nos. 70425001, 70573104 and 70371064,

and the Key Projects of National Science and Technology of China (2001-BA608B-15, 2001-BA605-01). Corresponding author. Institute of Policy and Management (IPM), Chinese Academy of Sciences (CAS), P.O. Box 8712,

Beijing 100080, China. Tel./fax: +86 10 62650861.

E-mail address: [email protected](Y.-M. Wei).1 Dr. Ying Fan is a Professor at the Institute of Policy and Management, Chinese Academy of Sciences, China. Her

research field is energy policy and system engineering. In 2004, she was a visiting scholar at Cornell University, USA.2 Mr. Qiang Liang is a Ph.D. candidate in Management Science at the Institute of Policy and Management, Chinese

Academy of Sciences, China.3 Dr. Yi-Ming Wei is a Professor at the Institute of Policy and Management of the Chinese Academy of Sciences, China.

He was a visiting scholar at Harvard University in the United States in 2005.

0140-9883/$ - see front matter 2006 Elsevier B.V. All rights reserved.

doi:10.1016/j.eneco.2006.10.012
mailto:[email protected]://dx.doi.org/10.1016/j.eneco.2006.10.012http://dx.doi.org/10.1016/j.eneco.2006.10.012mailto:[email protected]


2/16

comparisons with some other time series prediction approaches, such as PMRS and Elman network,

demonstrate the effectiveness and superiority of GPMGA over others.

2006 Elsevier B.V. All rights reserved.

Keywords: Pattern matching; Genetic algorithm; Crude oil price; Multi-step prediction

1. Introduction

Crude oil, sometimes called the blood of industries, plays an important role in any

economies. Oil price, as one of the main focal point in many countries, becomes an

increasingly essential topic of concern to governments, enterprises and investors. Influenced

by many complicated factors, oil prices appear highly nonlinear and even chaotic as Panas and

Ninni (2000) and Adrangi et al. (2001) pointed out, which makes it rather difficult to forecastthe future oil prices especially in multi-step prediction. Nevertheless, multifarious oil price

forecast approaches are projected, based on which many decisions with regard to oil prices

have to be made.

Oil price forecast approaches basically involve two classes, single-factor time series models

and multi-factor models. The first one considers time as an independent variable and builds up

mathematical models based on the oil price time series so as to produce future predictions; while

the latter also takes into account the main influential factors of oil prices such as GDP, supply and

demand into consideration, and multi-variable models are often constructed to forecast future oil

prices. In comparison with multi-factor models which have to foretell future values of influencing

factors before predicting oil prices, single-factor models appear to have the advantage of avoidinguncertainty involved in predicting other related variables.

There is substantial literature on time series methods related to oil price prediction, but most

of them only deal with single-step forecasting. Dominguez (1989), Crowder and Hamed

(1994), Moosa and Al-Loughani (1994) and Gulen (1998) mention that one-month forward

price is a remarkable indicator for short-term predictions of oil prices, and a few researches

indicate that oil prices takes on evident GARCH properties. A semi-parametric approach based

on VAR prediction approach is suggested to obtain a forecast of the entire density function of

the price of an asset (Barone-Adesi et al., 1998). The forecast ability of one-month forward as

an indicator of short-term oil prices is considered again by Claudio (2001), and the approach

proposed by Barone-Adesi et al. (1998) is adopted to perform a short-term forecast of Brent oilprice. Belief Network is introduced by Abramson and Finizza (1991), Abramson (1994) and

Monte Carlo Analysis is used to predict the crude oil price. Afterwards, the combination of

Belief Network and Probabilistic Model presented by Abramson and Finizza (1995) is applied

to take a probabilistic forecast in oil markets. Kaboudan (2001) uses compumetric methods to

perform short-term monthly forecasts of crude oil prices and suggests that genetic

programming has advantage over random walk predictions while the neural network forecast

proved inferior. Tang and Hammoudeh (2002) find that the nonlinear model based on the

Target Zone Theory can greatly improve the oil price forecasting ability. Ye et al. (2002, 2005,

2006a,b)) presents a short-term forecasting model of monthly West Texas Intermediate crude

oil spot prices using OECD petroleum inventory levels. Yousefi et al. (2005) introduces a

wavelet-based prediction procedure and market data on crude oil is used to provide forecasts

over different forecasting horizons. Sadorsky (2006) uses several different univariate and

multivariate statistical models such as TGARCH and GARCH to estimate forecasts of daily

890 Y. Fan et al. / Energy Economics 30 (2008) 889904


3/16

volatility in petroleum futures price returns. Ye et al. (2006a,b) shows the effect that surplus

crude oil production capacity has on short-term crude oil prices.

Recent years, a variety of new theories like neural network, Markov models and generalized

system are applied to the financial prediction (Prigmore and Long, 2003; Kodogiannis andLolis, 2002; Fong and See, 2002). Peters (1994) points out that most financial markets are not

Gaussian distributed, but tend to have sharper peaks and fat tails, a phenomenon well known

currently in practice. Under such evidence, a lot of traditional methods based on Gaussian

normal assumption have their shortage in making better forecasts. One of the key findings

explained by Peters (1994) is that most financial markets have a long memory; what happens

today affects the future forever. In other words, current data are correlated with all past data to

varying extents. The long-memory component of the market cannot be adequately explained

by a system that works with short-memory parameters. Therefore, prediction approaches based

on historical pattern matching should be chosen for long-memory systems. Fractal Market

Hypothesis also offers sturdy support for the feasibility of historical pattern matchingapproaches. Peters (1994) provided extensive evidence supporting his claim that markets have

a fractal structure based on different investment horizons, and they cover longer distance than

the square root of time, i.e. they are not random in nature. Instead, they consist of non-

periodical cycles which are hard to detect and use in statistical or neural forecasting. He argues

that financial markets are predictable, but accurate predicting algorithms need long memories.

Farmer and Sidorowich (1988) find that chaotic time series prediction using local

approximation techniques is much better than global approximations. Local approximation

refers to the idea of breaking up the time domain into small neighborhood regions and

analyzing them separately. As an empirical study, the combination of genetic algorithm, neural

network and chart pattern recognition is employed to forecast stock prices (Leigh et al., 2000).The integration of chart pattern recognition and wavelet radial basis function network is also

applied for the same task (Liu et al., 2004). The nearest neighbor approach presented by

Farmer and Sidorowich (1988) and the pattern imitation techniques presented by Motnikar

et al. (1996) are used for time series prediction. Recently, the pattern modeling and recognition

system (PMRS) presented by Singh (1999a,b; 2001), Singh and Fieldsend (2001) is adopted

for financial time series forecast. The successful application of local approximation approaches

such as chart pattern recognition, nearest neighborhood approaches, pattern imitation

techniques, PMRS and so on indicates that local approximation approaches are quite effective

for the prediction of nonlinear time series.

Oil price time series is a nonlinear long-memory series (Alvarez-Ramirez et al., 2002, 2003;Robinson and Yajima, 2002; Bernabe et al., 2004; Gil-Alana, 2001). As such, we can improve

forecasting ability for oil price time series prediction by using local approximation approaches.

In this paper, a new local approximation approach based on genetic algorithm and generalized

pattern match is proposed, which is implemented for the prediction of crude oil price time series.

The empirical study is constructed for a one-month forecasting of the Brent and WTI crude oil

prices respectively, and the satisfying forecasting results are achieved. Finally, the comparison

between GPMGA and some other time series prediction approaches, such as PMRS and Elman

neural network, demonstrates the effectiveness and superiority of GPMGA.

This paper is organized as follows: in Section 2, the generalized pattern matching approach is

proposed after the pattern modeling and recognition system is introduced, and the generalized

pattern matching approach based on genetic algorithm is described; in Section 3, the Elman

network is employed as a comparative approach; in Section 4, an empirical study is conducted to

show the precision of GPMGA; and the last section is the conclusion.

891Y. Fan et al. / Energy Economics 30 (2008) 889904


4/16

2. Generalized pattern matching based on genetic algorithm

In this section, the pattern modeling and recognition system is introduced; then generalized

pattern matching approach is proposed; a parameter optimization based on the genetic algorithmis described as well.

2.1. Pattern modeling and recognition system (PMRS)

Denote the time series as y = {y1,y2,, yn}, where n is its length. The past state sp= {yj k+1,

yj k+ 2,yj} most similar to the current statesc= {yn k+1,yn k+2,yn} can be searched among the

historical data of the time series by using pattern matching approaches, where kis the size of the

state, state sp is so called the nearest neighbor of state sc. Then the past state {yj+1,yj+2,yj+ m} is

used to forecast the future state {yn+1,yn+ 2,yn+m}, where m is the prediction length. A segment

of the time series y is defined as = (i,i+1,,i+ k1), where kis the size of segment, 1 i i +k1n1, j=yj+1yj, ij i + k1. The offset between two states {yp,yp+1,yp+ k} and

{yq,yq+1,yq+ k} is denoted as j Pk

i1 wi dpi1dqi1

; 1pp + kn, 1qq + kn,

where wi is the weight, 1 ik. A pattern is defined as = (bi,bi+1,bi+ k1), where k is the

size of pattern , 1 i i + k1n1, bj 1;yj1zyj0;yj1byj

;

&iji + k1, bjis so called tag, i + k1

is called marker position. The main point of PMRS is to model the current pattern of a time series by

directly matching its current pattern with past pattern, and then a forecast can be made according to

the pattern following the most similar past pattern (Singh et al., 1999a,b; Singh, 2001; Singh and

Fieldsend, 2001).

2.2. Generalized pattern matching (GPM)

PMRS can be considered as a direct pattern matching method, since the current pattern and

the past pattern are matched without any transformation. Historical data being directly searched

to match the past state most similar to the current one implies that current oil prices are expected

to change exactly according the historical rules. However, the complexity of oil price

movements often makes this kind of direct matching to be inaccurate. On one hand, historical

rules do not always appear in the same way. For example, oil prices had risen by 2 dollars in the

past two months, and following oil prices appeared to show a similar movement but with more

rapidly rising speed, i.e. it took only 1 month to rise by 2 dollars. These phenomena may becaused by the differences of the time and market conditions, although the rule behind it may be

similar. In other words, historical rules may have only the similarity other than the exact right. On

the other hand, small pattern size is usually used for a strong local search in the PMRS method,

which can ensure that current pattern and the past pattern match well only in relatively short

periods, while seriously deviating in longer pattern sizes. Thus, when performing a multi-step

prediction, it is hard to find similar status in the historical time series by using PMRS. Here, a

generalized pattern matching approach (GPM) is proposed in order to improve PMRS method in

these two aspects.

2.2.1. The rational of GPM

Denote a state {yi,yi+1,yi+ k1} of the time series as s(i,k), 1 i i + k1n, where i, so

called the beginning position of state s, indicates the position of the first element of state s in the

time series y, kis the size of state s, namely, the number of the elements in state s. Obviously the



5/16

subsequent state of state s is the state whose beginning position is just behind the last element of

state s. For instance, the subsequent state of state s(i,k) = {yi,yi+1,,yi+ k1} is s(i + k,k) = {yi+ k,

yi+ k1,yi+ k+ k1}, where k is the size of s.

States in the time series are directly used to implement pattern matching in PMRS, whilein the generalized pattern matching, states are imposed appropriate scale transform both in

the x-axis and the y-axis directions beforehand, equivalent to match states after historical

rules of oil price movements are appropriately adjusted. Suppose the series (x1,x2,xM) is

obtained from state s(i,k) through a transformation, a pattern is defined as (i,M,,), where

i and M are the starting position and size of pattern , and are respectively called the

scale factors in the x-axis and the y-axis directions of pattern , the state s(i,k) before

transform is called the original state of pattern . This procedure is described in detail as

follows.

The original state is transformed in the x-axis direction by the factor. In general, suppose

(0,+), the original state s(i,k) is disposed through linear interpolation, and then pattern is attained after a scale transform in the x-axis direction. Denote t j1

a; t= t , pattern and

its original state s satisfies following equations:

M kaa 1b c 1

xj yiPt tPt

yi

Ptyi

Pt1

2

where 1jM.

Likewise, the original state is transformed in the y-axis direction by the factor . In general,suppose (0,+), the original state s(i,k) is simply multiplied by . Pattern and its original

state s satisfy following equations:

M k 3

xj byij1 1VjVk: 4

The transform factors and represent the differences between the current price

fluctuation and the similar historical phenomenon. reflects the relation between current price movements and the past in the time scale given a certain price change. If =1, the

current pattern takes on the same scale as the searched historical one, i.e., it takes the same

time to realize the given price change both at present and in the past. If b1, the current

pattern is what the historical one scales down to, i.e., it takes a shorter period to fulfill the

price change at present than in the past. IfN1, the current pattern is what the historical one

scales up to, i.e., it takes a longer period to achieve the price change at present than in the

past. Similarly, reflects the relation between current price movements and the past in the

price scale given a certain period. If=1, the current pattern manifests the same scale as the

searched historical one, i.e., there is a same price change during the given period both at

present and in the past. IfN1, the current pattern is what the historical one scales up to, i.e.,

larger price change appears during the period at present than in the past. If b1, the current

pattern is what the historical one scales down to, i.e., smaller price change takes place during

the period at present than in the past.



6/16

Unlike PMRS, which matches the current state and the past one directly, GPM takes the

differences between the current state and the past one in addition to their similarities into

consideration, and the past one is scaled both in the x-axis and the y-axis directions to match the

current one indirectly, so it is suitable to be called

generalized pattern matching

. When GPM isadopted, the satisfying past pattern most similar to the current one could be found even if a large

pattern size is required. Since the similarity between the current and the past patterns in

generalized sense can be kept in a long period, a more satisfying result can be obtained when

performing a multi-step prediction.

2.2.2. The modeling process of GPM

The same as PMRS, the pattern structure in GPM is determined by the pattern size. For time

series y, Suppose current time is n, it is expected to get a m-step prediction. Denote actual values

in the prediction period are {yn+1,yn+2,yn+m}; predicted values are {n+1,n+2,n+m}.

For a certain k, current state sc(n i + 1,k) is taken as current pattern directly, i.e.

qc ni 1; k; 1; 1 sc ni 1; k 5

The past pattern h(i,k,,), which is most similar to the current pattern c can be calculated

by genetic algorithm explained in next section. Correspondingly, the original state of the past

pattern is sh(i,k). Since the scale transform in the y-axis direction doesn't change the pattern size,

the relation between k and k can be received from Eq. (1) as follows:

k kVaa 1b c 6

Therefore, kk+ 1 b k+1, that is:

k a1

aV kVb

k a

a7

The minimal k being up to Eq. (6) should be taken as the size of the original state sh i.e.

kVk a1

a

$ %8

So the original state of pattern h(i,k,,) is sh i;ka1

a

: Subsequently, the beginning position

of the consequent state sh is i ka1

a

: Since q Vh i

ka1a

; m; a; b

can be obtained from sh

through transform, the size ofsh should bema1

a

: Pattern qVh i

ka1a

; m; a; b

can be attained

from state s Vh i ka1

a

;

ma1a

though the same transform as the one for pattern h.

On the assumption that there exists local similarities in the time series according to many

local approximation approaches, namely there are always similar periods in history, if the past

pattern h generated from the past state sh through scale transforms can best match the current

pattern c, then the pattern h generated from the consequent state sh is rationally expected to

match the future pattern f. So f can be directly derived from h. In the mean time, f is

generated from the future state sf without any transform just like c is not transformed either in



7/16

the x-axis or the y-axis direction, therefore the future state sf can be derived from f directly,

i.e. the result of m-step prediction is:

yn1; yn2

;: : :

;

ynm

sf n 1; m qf n 1; m; 1; 1

qhV

i k a1

a

$ %; m; a; b

:

To quantify the difference between two patterns: 1(i1,M,1,1) = {x11,x2

1,,xM1 } and 2(i2,M,2,

2) = {x12,x2

2,,xM2 } with the same pattern size in the pattern matching process, an offset

between patterns 1 and 2 is defined as follows:

j ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1M

XMj1

x1jx2j

2

vuut: 10

To sum up, the modeling process of GPM can be described as follows. For each possible k, the

steps below should be gone through:

(1) Search the historical pattern h(i,k,,) most similar to the current pattern c(nk+ 1,k,1,1)

whose original state is s(nk+ 1,k) = {yn k+1,yn k+2,yn} by minimizing the offset

between h and c;

(2) Figure out the original state sh of pattern h and its subsequent state sh accordingly;(3) Apply the same transform as the one for the best matched historical pattern h(i,k,,) to sh,

and compute out the prediction {n+1,n+2,n+m};

(4) Calculate the prediction error;

(5) Choose the optimal k which minimizes the predicted error.

Illustrated as above procedure, the optimal pattern size k relies on the enumerating approach,

but the difference from PMRS is that a largerkdoesn't lead to a great departure between searched

past patterns and the current pattern, and does make past patterns and the current pattern matched

better in a longer period. It implies that GPM can more accurately reflect historical rules of oil

price movement.To search for the past pattern h most similar to the current pattern c, following optimization

problem should be solved: for a given pattern size k, find the optimal parameters i, and by

minimizing the offset between the two pattern c(nk+ 1,k,1,1) and h(i,k,,). This question

can be solved effectively by means of a genetic algorithm presented below.

2.2.3. Generalized pattern matching based on genetic algorithm (GPMGA)

Genetic algorithm as one of global optimizing methods simulates the evolving process of the

life form in nature. Individuals of one generation exchange information with other individuals

through genetic operators such as selection and crossover, so that a new better generation is

obtained. Keeping the process iteratively, an optimal solution can be figured out. Genetic

algorithm, which can avoid local optimization in the searching process, holds advantage over the

common local search methods. The combination of genetic algorithm and other methods plays a

more and more important role in the forecasting field. Leigh et al. (2000) integrate genetic



8/16

algorithm, neural network and chart pattern recognition to predict stock prices, Nunnari (2004)

adopts wavelet function approximation based on genetic algorithm to forecast the air pollution

time series, both of which attained good results.

In this paper, genetic algorithm is employed to search the optimal solution of aboveoptimization problem. Suppose a pattern, (i,k,,), represents an individual, whose chromosome

must involve the information of three parameters: i, and . Thus the decimal strings of these

three parameters are assigned as the gene group of a chromosome.

In the process of evolution, the fitness function, which is used to evaluate whether a individual

is good enough, is defined as follows:

g q 1

gj11

where is a constant, is the offset between the past pattern h and the current pattern c.

To keep the evolution towards more optimal generations with time going on, the classicalroulette wheel approach combined with the elite strategy approach is implemented. During

roulette wheel selection, two mates are selected for reproduction with a certain probability, which

is in proportion to their fitness values.

Two-point-crossover approach is implemented as the crossover operator. The substrings

defined by the chosen two random points in the selected pair of strings are exchanged with a

certain probability.

Two-element swap mutation and self-adaptive mutation are implemented. The mutation

temperature of an individual j is defined as follows (He et al., 2002):

Tj U 0; 1 1g qj

PNl1

g ql

0BBB@1CCCA 12

where U(0,1) is the uniform distribution over the range [0,1].

Let j respectively represents one of the parameters i, and , j= 1,2,3, then the new

parameter, j, of the new individual after self-adaptive mutation can be defined as:

hjV

hj rTjN 0; 1 13

where the constant [0,1] is called the severity coefficient, and N(0,1) is the standard normal

distribution.

3. Elman network

The recurrent network developed by Elman has a simple architecture, and it can be trained

using the standard BP learning algorithm. The context units of the Elman network memorize

some past states of the hidden units, so the output of the network depends upon an aggregate of

the previous states and the current input. In this architecture, in addition to the input, hidden and

output units, there are also context units. The input and output units interact with the outside

environment, while the hidden and context units do not. The input units are only buffer units that

pass the signals without changing them. The output units are linear units which sum the signals

fed to them. The hidden units have nonlinear sigmoidal functions. The feedforward connections



9/16

are modifiable, but the recurrent are fixed. This network has been proved to be effective and was

applied as a benchmark model in the research of Kermanshahi and Iwamiya (2002) and

Kodogiannis and Lolis (2002).

3.1. Data standardization

Before the neural network modeling, a preprocessing of data by an appropriate transform can

greatly improve the modeling result. One of the two common transforms is that which makes thetransformed data value limited in a certain interval, another is that the mean and the variance of

the transformed data are made in a certain interval. The latter one is employed here, by which the

mean and the variance are 0 and 1. Denote an original observation as yt, with mean yand variance

Dy, then the transformed data y can be expressed as follows:

ytVyt

Py

Dy: 14

3.2. Related parameters

3.2.1. Input layer

The number of nodes in the input layer represents the length of the window, or the number of

lagged observations used to discover the underlying pattern in a time series. This is the most

crucial variable for a forecasting problem, since the vector contains important information about

complex structure in the data. However, there is no widely accepted systematic way to determine

the optimum length for an input vector (Zhang and Patuwo, 1998), therefore the Gamma test,

autocorrelation and the ARMA model are used to analyze the data, and then the most appropriate

node number of the input layer is decided accordingly (Stefansson et al., 1997; Sfetsos and

Coonick, 2000.

3.2.2. Hidden layers

The number of the hidden nodes should make the training error as small as possible and the

network architecture simplest, namely with as few hidden nodes as possible (Weiss and

Fig. 1. The Brent crude oil prices (from 5/20/1987 to 7/26/2005).



10/16

Kulikowski, 1991). To enhance the network's performance, two hidden layers are considered.

Denoting the number of the hidden nodes as S1 and S2 respectively in each hidden layer, which

are the optimal values taken as the number of the hidden nodes by minimizing the training errors.

3.2.3. Output layer

There usually are two forecasting approaches for neural network, the iterative and the direct

approaches (Wilson et al., 2002). The first approach requires only a single output node, withforecast values being substituted into the input vector to make further predictions, while the

second one provides multiple output nodes that correspond to the forecasting horizon. In order to

decrease the number of output nodes and fulfill a better result, we employ the first method.

4. Empirical study

4.1. Data

Daily Brent and WTI crude oil prices, received from IEA website, are used as an empirical

study. The price unit is dollar/barrel. To deal with a few missing data, a linear interpolation isperformed on the original data. In this study, there are 4681 data for Brent daily oil price dating

from 5/20/1987 to 7/26/2005, illustrated as Fig. 1.

Fig. 1 shows that the Brent oil price takes on violent local fluctuation. The fluctuation is kept

even in a long period. So does the WTI prices. The WTI data consists of 4933 groups of daily

observations from 1/2/1986 to 7/26/2005.

To test the forecasting ability of GPMGA, PMRS and Elman network are employed as

comparative approaches.

All data are divided into three parts when PMRS, Elman network and GPMGA are employed

to make a multi-step prediction, shown as Fig. 2.

The modeling data are used for in-sample computation of the parameters of PMRS, Elmannetwork and GPMGA. The evaluating data are used for selection of the best parameters of these

models. When a group of model parameters is calculated according to the modeling data, a

prediction of the evaluating data is obtained and the forecasting results are compared with the

Fig. 2. Data classification when modeling.

Table 1

The division of experimental data

Data Parts Period Length

Brent Modeling data From 5/20/1987 to 4/22/2005 4614

Evaluating data From 4/25/2005 to 6/24/2005 45Testing data From 6/27/2005 to 7/26/2005 22

WTI Modeling data From 1/2/1986 to 4/22/2005 4960

Evaluating data From 4/25/2005 to 6/24/2005 45

Testing data From 6/27/2005 to 7/26/2005 22



11/16


12/16

4.3.2. Evaluation of forecasting results

For Brent data, the forecasting results received from PMRS method, Elman network and

FPMGA method are illustrated in Fig. 3.

Fig. 3 shows that the forecasting result of PMRS is basically close to the actual prices in view

of trend, but not in most prices. And the forecasting curve of PMRS is too smooth to exhibit

violent fluctuation of oil prices within the short period. On the contrary, the forecasting result of

Elman neural network, which moves downward or upward dramatically, indicates that it is a

powerful nonlinear tool. Although rather exact predictions are produced by Elman network, theforecasting results of neural network have extreme fluctuations, far more than actual prices.

In the obtained results of GPMGA, the predicted values tally with the actual prices not only at

quite a few points but also the whole trend. The predicted curve is quite close to the actual curve

of Brent prices at almost every point, which shows the best predictive ability among the others.

For WTI data, the forecasting results of the three models are shown in Fig. 4.

Fig. 4 shows that the predicted values received from PMRS model obviously deviate from the

actual values in the late period, which is caused by the limitation of the small pattern size

Fig. 4. Forecasting results of the three models for WTI data (From 6/27/2005 to 7/26/2005).

Fig. 3. Forecasting results of the three models for Brent data (From 6/27/2005 to 7/26/2005).



13/16

determined by PMRS, though the forecasting result in the early period is very close to the actual

price. While, Elman network behaves well at only a few points, some of which even overlap the

predicted and actual value. However it poorly behaves at most other points.

Different from the above two models, the fluctuating rules are accurately mastered by GPMGA

and a better result is produced out only in the locality but also in the whole prediction period.

The Root Mean Square Error (RMSE) and the Mean Absolute Percentage Error (MAPE) ofeach model are shown in Table 4.

The empirical study, a 21-step prediction of Brent and WTI oil prices (a month roughly),

illustrates the predictive ability of three models. The prediction errors, MAPE, in these models are

all below 8%. Among them, the MAPE of GPMGA model is around 2% both in Brent and WTI,

which is much better than the PMRS and Elman network. This result clearly shows the

effectiveness in dealing with multi-step predictions of oil prices.

The oil price trend in the early period before 6/24/2005 appears as a rapid rising trend, even

though a decreasing trend right after can be accurately predicted by GPMGA model, which

implies the GPMGA can somewhat more accurately forecast the position of inflection points

referring to historical information. Therefore, GPMGA avoids the disadvantage of linearprediction, which tends to tell people that future prices will keep rising from the rising trend of

earlier prices. Strong evidence which shows that GPMGA can capture the nonlinear

characteristics of price movements in oil time series is obtained from this empirical study.

However, GPMGA takes much longer time to implement the modeling process. For our case,

about 97 h was spent.

4.4. Discussion

The fluctuation of oil prices is often related with macro situation and key events in the world. In

general, political events, military conflicts, serious climate abnormalities, catastrophes andaccidents in important oil-producing areas would lead to sharp changes in oil prices. Therefore, any

long-term predictions should be based on them. However, reality is so much more complicated that

it is hard to capture the similarity from all events or scenarios that may affect oil prices.

As shown above, GPMGA model finds the similarity among patterns: starting from Sep. 2003

and Jun. 2005, and gives a somewhat satisfying forecast in Jul. 2005 accordingly. So let's explore

the events behind these patterns.

In Sep. 2003, the events behind oil price rapidly rising are: i) the deteriorating situation in Iraq

after war and the problematic postwar reconstruction. There were fears that frustration in the Middle

East region might aggravate further leading to shortage of oil supply; ii) the world economy appear to

rebound and recover. With economic recovery in most countries, oil demand increased by 7%

compared to before 2003, but oil production only increased by lower than 4% during this period; iii)

OPEC adopted the strategy of limiting oil production to maintain prices, cutting oil production even

though demand increased; iv) World oil stocks reached their lowest levels in these 10 years. This also

Table 4

Predicted errors of three models

Market Methods Elman network PMRS GPMGA

Brent RMSE 2.1661 1.9618 1.7746MAPE(%) 3.23 2.93 2.43

WTI RMSE 1.7733 1.5662 1.0909

MAPE(%) 2.59 2.24 1.57



14/16

included the US, whose stocks at the end of 2003 was 12% less than the same period in 2002,

amounting to only 0.27 billion barrels, the lowest figure within 20 years.

In June 2005, the influential factors behind rapidly rising oil prices are i) the rate of recovery in

the world economy is still high over an estimated above 3%, and the sustained strongadvancement of the US economy, contributing to greater oil demand; ii) The capacity of oil

production and refining up to their limits with no easily-attainable ways to greatly increase oil

production in order to meet the rising demand; iii) US entering into the summer season in which

their demand will crest, with expectation from the public tends toward future oil price increases;

iv) the persisting chaotic circumstances in the Middle East region and US uncertainties in Iraq; v)

The arrival of the hurricane season in summer and its serious impact on the coastal oil-producing

zones such as Mexico bay in US. For e.g. Hurricane Ivan caused heavy losses last year while

hurricanes may cause even worse damages this year; vi) The oil stock of US decreasing from

329 million bbl to 324.9 million bbl in the period of 10, June to 1, July.

5. Conclusions

The pattern matching is introduced to the prediction field of oil price in this paper and a new

model, the generalized pattern matching based on genetic algorithm (GPMGA), is proposed to

conduct multi-step forecasts of oil prices. In the GPMGA model, the past pattern most similar to

the current pattern is searched from historical observations to predict future prices according to

the historical rules represented by the matched past pattern. GPMGA overcomes some of defects

of PMRS and Elman network in the prediction of long-memory time series.

The empirical study for Brent and WTI crude oil prices illustrates the effectiveness of GPMGA.

In this study, useful historical information is found by GPMGA, which have good global searchcapabilities, so that the best matched past pattern can be found both rapidly and accurately.

GPMGA is not only a powerful model that can implement a better multi-step prediction, but also an

important tool for information mining. By using it, the rules of oil price fluctuation related to macro

situation and key events can be investigated through similarities between past and current oil price

movements, highlighting the GPMGA as a practical analytic tool for detecting oil price movements.

Acknowledgements

The authors gratefully acknowledge the financial support from the National Natural Science

Foundation of China (NSFC) under the grants Nos. 70425001, 70573104 and 70371064, the KeyProjects from the Ministry of Science and Technology of China (grants 2001-BA608B-15, 2001-

BA60501). We also would like to thank Professor R.S.J. Tol and the anonymous referees for their

helpful suggestions and corrections on the earlier draft of our paper according to which we improved

the content.

References

Abramson, B., 1994. The design of belief network-based systems for price forecasting. Computers & Electrical

Engineering 20, 163180.

Abramson, B., Finizza, A., 1991. Using belief networks to forecast oil prices. International Journal of Forecasting 7, 299315.Abramson, B., Finizza, A., 1995. Probabilistic forecasts from probabilistic models: a case study in the oil market.

International Journal of Forecasting 11, 6372.

Adrangi, B., Chatrath, A., Dhanda, K.K., Raffiee, K., 2001. Chaos in oil prices? Evidence from futures markets. Energy

Economics 23, 405425.



15/16

Alvarez-Ramirez, J., Cisneros, M., Ibarra-Valdez, C., Soriano, A., 2002. Multifractal Hurst analysis of crude oil prices.

Physica A 313, 651670.

Alvarez-Ramirez, J., Soriano, A., Cisneros, M., Suarez, R., 2003. Symmetry/anti-symmetry phase transitions in crude oil

markets. Physica A 322, 583596.

Barone-Adesi, G., Bourgoin, F., Giannopoulos, K., 1998. Don't look back. Risk August, pp. 100103.Bernabe, A., Martina, E., Alvarez-Ramirez, J., Ibarra-Valdez, C., 2004. A multi-model approach for describing crude oil

price dynamics. Physica A 338, 567584.

Box, G., Jenkins, G.M., Reinsel, G., 1994. Time Series Analysis: Forecasting and ControlThird edition. Prentice Hall.

Claudio, M., 2001. A semiparametric approach to short-term oil price forecasting. Energy Economics 23, 325338.

Crowder, W., Hamed, A., 1994. A cointegration test for oil futures market efficiency. Journal of Futures Markets 13 (8),

933941.

Dominguez, K.M., 1989. The volatility and efficiency of crude oil futures contracts, ch.2. In: Dominguez, K.M., Strong, J.S.,

Weiner, R.J. (Eds.), Oil and money: Coping with price risk through financial markets. Harvard International Energy

Studies, pp. 4897.

Farmer, J.D., Sidorowich, J.J., 1988. Predicting chaotic dynamics. In: Kelso, J.A.S., Mandell, A.J., Shlesinger, M.F. (Eds.),

Dynamic Patterns in Complex Systems. World Scientific, Singapore, pp. 265292.

Fong, W.M., See, K.H., 2002. A Markov switching model of the conditional volatility of crude oil futures prices. EnergyEconomics 24, 7195.

Gil-Alana, L.A., 2001. A fractionally integrated model with a mean shift for the US and the UK real oil prices. Economic

Modelling 18, 643658.

Gourieroux, C., 1997. ARCH Models and Financial Applications. Springer-Verlag.

Gulen, S.G., 1998. Efficiency in the crude oil futures markets. Journal of Energy Finance & Development 3 (1), 13 21.

He, Y., Chu, F., Zhong, B., 2002. A hierarchical evolutionary algorithm for constructing and training wavelet networks.

Journal of Energy Finance & Development 10, 357366.

Kaboudan, M.A., 2001. Compumetric forecasting of crude oil prices. Proceedings of the 2001 Congress on Evolutionary

Computation, vol. 1, pp. 283287.

Kermanshahi, B., Iwamiya, H., 2002. Up to year 2020 load forecasting using neural nets. Electrical Power and Energy

Systems 24, 789797.

Kodogiannis, V.S., Lolis, A., 2002. Forecasting financial time series using neural network and generalized system-based

techniques. Neural Computing & Applications 11, 90102.

Leigh, W., Odisho, E., Paz, N., Paz, M., 2000. Progress report: improving the stock price forecasting performance of the

bull flag heuristic with genetic algorithms and neural networks. IEA/AIE 2000, 617622.

Liu, J.N.K., Kwong, R.W.M., Bo, F., 2004. Chart patterns recognition and forecast using wavelet and radial basis function

network. KES 2004, 564571.

Moosa, I.A., Al-Loughani, N.E., 1994. Unbiasedness and time varying risk premia in the crude oil futures markets. Energy

Economics 16 (2), 99105.

Motnikar, B.S., Pisanski, T., Cepar, D., 1996. Time-series forecasting by pattern imitation. OR-Spektrum 18 (1), 4349.

Nunnari, G., 2004. Modelling air pollution time-series by using wavelet functions and genetic algorithms. Soft Computing

8, 173178.

Panas, E., Ninni, V., 2000. Are oil markets chaotic? A non-linear dynamic analysis. Energy Economics 22, 549

568.Peters, E., 1994. Fractal Market Hypothesis: Applying Chaos Theory to Investment and Economics. Wiley.

Prigmore, M., Long, J.A., 2003. A comparison of the effectiveness of neural and wavelet networks for insurer credit rating

based on publicly available financial data. IEA/AIE 2003, 527536.

Robinson, P.M., Yajima, Y., 2002. Determination of cointegrating rank in fractional systems. Journal of Econometrics 106,

217241.

Sadorsky, P., 2006. Modeling and forecasting petroleum futures volatility. Energy Economics 28, 467488.

Sfetsos, A., Coonick, A.H., 2000. Univariate and multivariate forecasting of hourly solar radiation with artificial

intelligence techniques. Solar Energy 68 (2), 169178.

Singh, S., 1999a. Noise impact on time-series forecasting using an intelligent pattern matching technique. Pattern

Recognition 32, 13891398.

Singh, S., 1999b. A long memory pattern modeling and recognition system for financial time-series forecasting. Pattern

Analysis & Applications 2, 264273.Singh, S., 2001. Multiple forecasting using local approximation. Pattern Recognition 34, 443455.

Singh, S., Fieldsend, Jonathan, 2001. Pattern matching and neural networks based hybrid forecasting system. ICAPR

2001, 7282.

Stefansson, A., Koncar, N., Jones, A.J., 1997. A note of the gamma test. Neural Computing & Applications 5, 131 133.



16/16

Tang, L., Hammoudeh, S., 2002. An empirical exploration of the world oil price under the target zone model. Energy

Economics 24, 577596.

Weiss, S.M., Kulikowski, C.A., 1991. Computer Systems That Learn. Morgan Kaufmann.

Wilson, I.D., Paris, S.D., Ware, J.A., Jenkins, D.H., 2002. Residential property price time series forecasting with neural

networks. Knowledge-Based Systems 15, 335341.Ye, M., Zyren, J., Shore, J., 2002. Forecasting crude oil spot price using OECD petroleum inventory levels. International

Advances in Economic Research 8, 324334.

Ye, M., Zyren, J., Shore, J., 2005. A monthly crude oil spot price forecasting model using relative inventories. International

Journal of Forecasting 21, 491501.

Ye, M., Zyren, J., Shore, J., 2006a. Forecasting short-run crude oil price using high- and low-inventory variables. Energy

Policy 34, 27362743.

Ye, M., Zyren, J., Shore, J., 2006b. Short-run crude oil price and surplus production capacity. International Advances in

Economic Research 12, 390394.

Yousefi, S., Weinreich, I., Reinarz, D., 2005. Wavelet-based prediction of oil prices. Chaos, Solitons and Fractals 25,

265275.

Zhang, G., Patuwo, B.E., Hu, M.Y., 1998. Forecasting with artificial neural networks: the state of the art. International

Journal of Forecasting 14, 3562.


a generalized pattern matching approach for multi-step

Documents