optimization of the hidden layer of a multilayer … of the hidden... · 2016-11-22 · network....
TRANSCRIPT
OPTIMIZATION OF THE HIDDEN LAYER OF A MULTILAYER PERCEPTRON WITH BACKPROPAGATION (BP) NETWORK USING
HYBRID K-MEANS-GREEDY ALGORITHM (KGA) FOR TIME SERIES PREDICTION
James Tan Yiaw Beng
Master of Engineering 2012
To my parents, family, relatives, friends and everyone else too numerous to name here
who have worked hard and tirelessly to assist in the completion of this research project
ACKNOWLEDGEMENT
First of all I would like to take this opportunity to give my sincerest appreciation to
my academic supervisors Ir. David Bong Boon Liang and Assoc. Prof. Ir. Dr. Andrew Ragai
Henry Rigit for their patience in providing their support, knowledge, guidance and feedback
throughout this research. I would also like to thank the late Mdm. Irene Lim and Mr. Law
Kok Heng of Sarawak Energy Bhd. for their consent and assistance in providing the necessary
data to be utilized in this research, as well as their comments and ideas during the research.
Special thanks also go out to my family, relatives and friends for their motivation and support
through the good times and the bad times I have experienced for the duration of this research.
I would also like to thank the Ministry of Science, Technology and Innovation
(MOSTI) and Universiti Malaysia Sarawak (UNIMAS) for tinancial support and the
necessary facilities in performing the activities necessary for this research. I would also like to
express my gratitude to Solar Influences Data Analysis Center (SIDC) of the Royal
Observatory of Belgium and the Department of Statistics, Malaysia for their consent in
providing the additional data to be utilized to evaluate the performance and effectiveness of
the proposed model developed during this research.
Last but not least, I would like to express my heartfelt thanks to all other individuals
that are too numerous to be named here and whom were involved either directly or indirectly
for the duration of this research.
111
ABSTRACT
(Research into the field of artificial neural networks (ANN) is fast gaining interest in
recent years, due to the fact that it is fast becoming a popular tool of choice in prediction of
time series trends. This recent surge in its popularity can be attributed to the fact that ANN,
especially a multilayer perceptron with backpropagation (BP) network that has the optimal
number of neurons in its hidden layer would be able to predict with better accuracy unknown
values of a time series that it is trained with, compared to other methods implemented to
predict the same time series The drawback of using BP networks in time series prediction is
that it is difficult and time-consuming to find the optimal number of neurons in its hidden
layer to minimize the prediction error. We propose a model known as K-means-Greedy
Algorithm (KGA) model in this research to overcome this serious drawback of the BP
network. The proposed KGA model combines greedy algorithm withk-means++ clustering in
this research to assist users in automating the finding of the optimal number of new-ons inside
the hidden layer of the BP network. The evaluation results the proposed KGA model using
several time series, namely the sunspot data, the Mackey-Glass time series, and electrical load
forecasting using data from several econometric factors, as well as historical electricity
demand data, show that the proposed KGA model is eflective in finding the optimal number
of neurons for the hidden layer of a BP network that is used to perform time series prediction.
IV
ABSTRAK
Penyelidikan ten tang rangkaian neural buatan (ANN) semakin mendapat sambutan
dewasa ini, kerana ANN semakin popular di kalangan penyelidik IIntuk mendapatkan
ramalan bagi sesuatu siri masa (time series). Peningkatan dalam populariti ini adalah
disebabkan sesllatu ANN, terutamanya Rangkaian Perseptron Berbilang Lapis dengan
Algoritma Rambatan Belakang (BP), yang mempllnyai bilangan neuron yang optimum dalam
lapisan tersembllnyinya dapat meramalkan nilai-nilai bagi sesllatu siri masa yang tidak
diketa/lUi dengan lebih jitll berbanding dengan kaedah-kaedah lain. Kelemahan ketara
rangkaian BP ialah proses untuk mendapatkan rangkaian BP optimum yang meminimwnkan
ralat antara nilai-nilai ramalan dengan nilai-nilai sebenar bagi siri masa tersebut
menyulitkan dan memakan masa yang lama. Oleh sebab itu, satu model yang menggabungkan
dua algoritma, iaitu algoritma penggugusan cara k++ (k-means++ algorithm) dengan
algoritma tamak (greedy algorithm) yang dinamakan model K-means-Greedy Algorithm
(KGA) telah dibangllnkan untuk membantll pengguna mendapatkan bilangan neuron yang
optimum dalam lapisan tersembunyi rangkaian BP. Penilaian telah dibllat terhadap model ini
dengan menggunakan beberapa siri masa, iaitll siri masa tompok matahari, siri masa
Mackey-Glass, serta ramalan penggunaan tenaga elektrik pada masa hadapan dengan
menggunakan faktor-faktor yang mampu mempenganthi penggllnaan tenaga elektrik, dan
dellgan menggunakan data penggllnaan tenaga elektrik pada masa lalLl. Keputusan
kepuillsan yang diperoleh hasil daripada penilaian-penilaian ini membllktikan bahawa model
KGA yang dibangunkan ini mampll mendapatkan bilangan neuron yang optimum bagi
seslIatu rangkaian BP yang digunakan untuk meramalkan nilai-nilai yang diketahlli dalam
S SUa/II siri masa yang telah digunakan unutk melatih rangkaian BP tersebllt.
v
ftuliat Kbidmlill MakJumat Akademi~ lJNlVERSm MALAYSIA SARAWAK
TABLE OF CONTENTS
PageContent
iiDEDICATION
iiiACKNOWLEDGEMENT
ivABSTRACT
ABSTRAK v
viTABLE OF CONTENTS
xiiiLIST OF FIGURES
xviiiLIST OF TABLES
xxiLIST OF ABBREVIATIONS
1CHAPTER 1: INTRODUCTION
1.1 INTRODUCTION
71.2 PROBLEM STATEMENT
71.3 RESEARCH OBJECTIVES
81.4 THESIS OVERVIEW
9CHAPTER 2: LITERATURE REVIEW
92.1 INTRODUCTION
92.1.1 Definition of an ANN
2.1.2 The basic building block of an ANN: A neuron 9
2.1.3 Using neurons to construct different ANN models 11
Vi
2.1.4 Training an ANN 12
2.1.5 Stopping the training of an ANN 14
2.1.6 Applications of ANN 16
2.2 MUL TILA YER PERCEPTRON WITH
BACKRPOPAGATION (BP) ALGORITHM 18
2.2.1 Architecture and algorithm of the BP network 18
2.2.2 Training a BP network and ending the training of the BP network 20
2.2.3 Using BP networks in prediction of time series trends 21
2.2.4 The importance of having hidden layers in BP networks 22
2.2.5 The importance of having the right size of the hidden layer 23
2.2.6 Optimizing the hidden layer in the BP network 25
2.3 OVERVIEW OF THE PROPOSED KGA MODEL 27
2.3.1 Parameters of the BP network to be optimized 27
2.3.2 Overall design of the proposed KGA model 28
2.3.3 Greedy algorithm 28
2.3.4 K-means++ clustering for search space reduction 29
2.4 SUMMARY 32
CHAPTER 3: HYBRID K-MEANS-GREEDY ALGORITHM (KGA) MODEL 33
3.1 INTRODUCTION 33
3.2 OVERVIEW OF THE HYBRID KGA MODEL 33
3.3 IMPLEMENTATION DETAILS OF THE HYBRID KGA MODEL 37
3.3 .1 The need for a suitable number of neurons in the
hidden layer of the BP network 37
VII
,.
3.3.2 Optimization criteria and parameters 39
3.3.3 The algorithm ofthe proposed KGA model 40
3.4 BENCHMARK TIME SERIES DATA USED TO TEST
THE PROPOSED KGA MODEL 50
3.4.1 Annual sunspot data 50
3.4.2 Mackey-Glass time series 51
3 .5 LOAD FORECAST OF THE TOTAL ELECTRICITY
DEMAND IN SARAWAK (1970-2005) 53
3.5.1 Econometric load forecast of the
total electricity demand in Sarawak (1970-2005) 53
3.5.2 The total electricity demand in Sarawak (1970-2005)
based on historical data 57
3.6 IMPLEMENTING THE PROPOSED KGA MODEL 58
3.6.1 The architecture, training and testing of the BP network 58
3.6.2 Executing the proposed KGA model 59
3.7 EV ALUATING THE PERFORMANCE OF THE PROPOSED KGA MODEL 60
3.7.1 Verification of the choice made by the proposed KGA model 61
3.7.2 Perfonnance of the BP network optimized by the proposed
KGA model versus perfonnance of other methods 62
3.8 SUMMARY 64
CHAPTER 4: RESULTS, ANALYSIS AND DISCUSSION FOR
APPLICATIONS ON BENCHMARK DATA 65
4.1 lNTRODUCTION 65
Vlll
II I
I4.2 SUNSPOT DATA 66
4.2.1 Method of evaluation of the proposed KGA model 66
4.2.2 Criteria of evaluation of the proposed KGA model 67
4.2.3 Evaluation results of the proposed KGA model 68
4.2.4 Verification of results obtained using the proposed KGA model 73
4.2.5 Comparison between the results obtained using the proposed
KGA model with results obtained using another method 75
4.3 MACKEY -GLASS TIME SERIES DATA 77
4.3.1 Evaluating the prediction made by the BP network 77
4.3 .2 Criteria of evaluation of the proposed KGA model 77
4.3.3 Method of evaluation of the proposed KGA model 78
4.3.4 Results of the evaluation of the proposed KGA model 79
4.3.5 Verification of the results obtained by the proposed KGA model 82
4.3.6 Comparison of the results obtained by the proposed KGA model
with results obtained using other methods 84
4.4 SUMMARY 85
CHAPTER 5: RESULTS, ANALYSIS AND DISCUSSION FOR
APPLICATIONS ON ELECTRICICAL LOAD FORECASTING 87
5.1 INTRODUCTION 87
5.1.1 The experiments to be conducted 87
5.1 .2 The criteria used for evaluation of the proposed KGA model 88
5.1.3 Structure of the results and discussion of the experiments 89
lX
5.2 ECONOMETRIC LOAD FORECASTING USING 2 FACTORS 89
5.2.1 Introduction 89
5.2.2 Correlation between each of the factors and the total electricity demand 90
5.2.3 Architecture of the BP network and the error metric used 91
5.2.4 Performance of the proposed KGA model on predicting
the total electricity demand in Sarawak using 2 factors 92
5.2.5 Verification of the results obtained using the proposed KGA model 97
5.2.6 Effect of finding the optimal number of neurons in the hidden layer
of the BP network on the prediction of total electricity demand
for individual years based on 2 factors 99
5.3 USING 4 FACTORS TO PREDICT THE TOTAL ELECTRICITY DEMAND 104
5.3.1 Architecture of the BP network and the error metric used 104
5.3.2 Performance of the proposed KGA model in using 4 factors
to predict the total electricity demand in Sarawak 104
5.3.3 Verification of the performance of the proposed KGA model 109
5.3.4 Effect on finding of optima~ number of neurons in the
hidden layer of the BP network on the prediction of
total electricity demand for individual years based on 4 factors III
5.4 COMPARISON OF MAPE ERROR BETWEEN USING 4 FACTORS TO
PREDICT THE TOTAL ELECTRICITY DEMAND USING BP NETWORK,
USING 2 FACTORS TO PREDICT THE TOTAL ELECTRICITY DEMAND
USING BP NETWORK, AND USING MULTIPLE REGRESSION 116
5.5 PREDICTING THE TOTAL ELECTRICITY DEMAND IN
SARA W AK BASED ON PAST HISTORICAL DATA (1970-2005) 119
x
5.6 USING DATA FOR THE PAST 2 YEARS TO
PREDICT THE TOTAL ELECTRICITY DEMAND 120
5.6.1 Architecture of the BP network and the error metric used 120
5.6.2 Results of using the proposed KGA model 121
5.6.3 Verification of results obtained using the proposed KGA model 126
5.6.4 Effect on finding of optimal number of neurons in the hidden layer
of the BP network on the prediction of total electricity demand
for individual years based on historical data for 2 years 129
5.7 USING DATA FROM THE PAST 5 YEARS TO
PREDICT THE TOTAL ELECTRICITY DEMAND 131
5.7.1 Architecture of the BP network and the error metric used 131
5.7.2 Results obtained from implementing the proposed KGA model 132
5.7.3 Verification of the results obtained from implementing
the proposed KGA model 137
5.7.4 Effect on finding of optimal number of neurons in the hidden layer
of the BP network on the prediction of total electricity demand
for individual years based on historical data for 5 years 139
5.8 COMPARISON BETWEEN THE PERFORMANCES OF THE BP
NETWORKS OPTIMIZED BY THE PROPOSED KGA MODEL
AND THE REGRESSION METHOD 141
5.9 SUMMARY 145
Xl
CHAPTER 6: CONCLUSION AND FUTURE WORK 146
6.1 BACKGROUND OF THE RESEARCH 146
6.2 SUMMARY OF THE RESEARCH 147
6.3 RESEARCH FINDINGS AND CONTRIBUTIONS 147
6.4 SUGGESTIONS FOR FUTURE WORK 149
REFERENCES 150
APPENDIX 1: SUNSPOT DATA 164
APPENDIX 2: MACKEY-GLASS TIME SERIES DATA 165
APPENDIX 3: GROSS DOMESTIC PRODUCT (GDP)
OF SARAWAK (1970-2005) 170
APPENDIX 4: POPULATION OF SARA W AK (1970-2005) 171
APPENDIX 5: CONSUMER PRICE INDEX (CPI) OF SARAWAK (1970-2005) 172
APPENDIX 6: COMPARISON BETWEEN MULTILAYER PERCEPTRON
WITH BACKPROPAGATION ALGORITHM AND RADIAL
BASIS FUNCTION NETWORKS TO PERFORM FORECAST
OF ELECTRICITY DEMAND 173
APPENDIX 7: APPLICATION OF MULTILAYER PERCEPTRON WITH
BACKPROPAGATION ALGORITHM AND REGRESSION
ANALYSIS FOR LONG-TERM FORECAST OF ELECTRICITY
DEMAND: A COMPARISON 182
APPENDIX 8: USING HYBRID K-MEANS-GREEDY ALGORITHM TO
OPTIMIZE THE HIDDEN LAYER OF A BACKPROPAGATION
NETWORK FOR TIME SERIES PREDICTION 188
Xli
LIST OF FIGURES
Figure Page
2.1 A neuron with R inputs and a bias (Hippert et aI., 200 I) 10
2.2 Flow chart of the supervised training process of an ANN 13
2.3 Reinforcement leaming process (Kumar, 2004) 14
2.4 A multilayer perceptron with backpropagation algorithm (BP)
network (Basheer and Hajmeer, 2000) 18
2.5 The pseudocode of the backpropagation algorithm 20
2.6 Linearly-separable problems and nonlinearly-separable problems
(Basheer and Hajmeer, 2000) 22
2.7 The effect of different size of the hidden layer on network generalization
(Basheer and Hajmeer, 2000) 23
2.8 Pseudocode of the greedy algorithm 29
2.9 The flow chart of the proposed KGA model 32
3.1 How the proposed KGA model is to be used to optimize the number of
hidden neurons in the BP network 34
3.2 Pseudocode of the algorithm of the proposed KGA model 40
3.3 The process of the selection of initial values of centroids after the first
centroid is chosen unifonnly at random among the observations Xi 43
3.4 The step-by-step process of clustering using k-means++ clustering
(Wikipedia, 2009) 45
3.5 The process of repeated clustering on the database of cOlTelations
between errors and the number of neurons in the hidden layer 46-47
X III
3.6 The process of evaluation of values of the number of neurons in the
hidden layer of the BP network by greedy algorithm 49
3.7 Annual sunspot number (1770-1869) 51
3.8 (a): When r:S17 (in this case, r=6) the Mackey-Glass Time Series is
oscillating at a period of 20 units.
(b) However, when r2:17 (in this case, r=20), the Mackey-Glass time series
follows a chaotic pattern. This figure is adapted from Mackey and Glass (1977) 52
3.9 Total demand for electricity in Sarawak from the years 1970-2005 54
3.10 Gross Domestic Product of Sarawak at constant price with the 1987
base year from the years 1970-2005 55
3.11 Population ofSarawak from the years 1970-2005 55
3.12 Consumer Price Index (CPI) of Sarawak from the years 1970-2005
with the year 2000 as the base year 56
3.13 Number of customers of electricity in Sarawak from the years 1970-2005 56
3.14 The method implemented to evaluate the effectiveness of the proposed
KGA model implemented to optimize the number of neurons
in the hidden layer of the BP network 61
3.15 The method implemented to compare the performance of the BP
network optimized by the proposed KGA model against methods
proposed by other researchers 63
4.1 The architecture of the BP network that is used to predict the sunspot number 67
4.2 Clustering of the errors and the corresponding number of neurons
in the hidden layer 69
4.3 The values of the points in cluster 2 in Figure 4.2 70
XIV
4.4 Clustering result of the values that are inside cluster 2 in Figure 4.2 71
4.5 Clustering of the range of values represented by cluster 3 in Figure 4.3 72
4.6 Plot of MSE vs. the number of neurons in the hidden layer of the BP network 74
4.7 Clustering of the errors and the corresponding number of neurons in
the hidden layer 79
4.8 The results of clustering the red cluster shown in Figure 4.7 81
4.9 Plot of the number of neurons in the hidden layer of the BP network
compared to RMSE error 83
5.1 The results of using the k-means++ algorithm to partition the list of guesses
attempted into 3 clusters 93
5.2 The result of clustering of the guesses made within the
range of the values shown in Table 5.3 95
5.3 The effect of changing the number of neurons in the hidden layer of the BP
network on MAPE when the BP network is considering 2 factors to predict
the future electricity demand 97
5.4 The effects of having different number of neurons in the hidden layer of the
BP network on individual differences between the actual and predicted
values of total electricity demand for the years 2000 through year 2005 100
5.5 The MAPE between the actual values and the predicted values
of total electricity demand for each of the years 2000 through 2005 101
5.6 Individual MAPE values for 2000-2005 predictions of total electricity
demand produced by setting the number of neurons in the hidden layer
of the BP network to be between 16 and 24 neurons 102
xv
5.7 The results of using the k-means++ algorithm to partition the list of
guesses attempted into 3 clusters 105
5.8 The result of clustering of the guesses made within the range of the values
hown in Table 5.10 107
5.9 The effect of having different number of neurons in the hidden layer
of the BP network on MAPE 110
5.10 The MAPE errors achieved by setting the number of neurons
in the hidden layer of the BP network to values between 113 and 132 III
5.11 The effects of having different number of neurons in the hidden layer
of the BP network on differences between the actual and predicted
values oftotal electricity demand for each of the years 2000 through 2005 112
5.12 The MAPE between the actual values and the predicted values of
total electricity demand for each of the years 2000 through 2005 113
5.13 Individual MAPE values for 2000-2005 predictions of total electricity
demand produced by setting the number of neurons in the hidden layer
of the BP network to be between 16 and 24 neurons 114
5.14 Comparison between the actual total electricity demand for the years 2000-2005
and the predicted values of the total electricity demand by the optimal BP
networks and the multiple regression model lIB
5.15 Inputting the electricity demand for the years 1970 and 1971 to predict
the electricity demand for the year 1972 120
5.16 The results of using the k-means++ algorithm to partition the attempted
guesses into 3 clusters 122
XVI
5.17 The result of clustering of the guesses made within the range of the
value shown in Table 5.16 124
5.18 The MAPE error achieved by changing the number of neurons in the hidden
layer of the BP network 126
5.19 The MAPE error resulting from having between II and 51 neurons
in the hidden layer of the BP network 127
5.20 The indjvidual MAPE for the years 2000-2005 with different number
of neurons in the hidden layer of the BP network that were
considered by the proposed KGA model to be the most accurate 130
5.21 The inputs to the BP network and the output of the BP network,
with I representing the year the particular data is taken 132
5.22 The results of using the k-means++ algorithm to partition
the list of guesses attempted into 3 clusters 133
5.23 The result of clustering of the guesses made within the range of the values
shown in Table 5.22 135
5.24 The effects of having different number of neurons in the
hidden layer on MAPE error 138
5.25 The average MAPE achieved when there are between 90 and 95 neurons in
the hidden layer of the BP network 139
5.26 Individual MAPE values for the years 2000 through 2005 when there are
are between 90 and 95 neurons in the hidden layer of the BP network 140
XVll
LIST OF TABLES
Table Page
2.1 Several transfer functions used in the ANN 11
4.1 Coordinates of the cluster centroids of the values shown in Figure 4.3 70
4.2 The values guessed from cluster 1 shown in Figure 4.4 73
4.3 The number of neurons in the hidden layer and their respective
MSE evaluated by the greedy algorithm 73
4.4 The number of the hidden neurons that produce smallest MSE errors 75
4.5 MSE errors obtained by using the optimal BP network
and other methods cited by Park et al. (1996) 76
4.6 The values of the points in the red cluster in Figure 4.7 80
4.7 Coordinates of the cluster centroids of the values shown in Table 4.7 81
4.8 The number of neurons in the hidden layer and their respective RMSE
evaluated by the greedy algorithm 82
4.9 Number of the hidden neurons that produce smallest RMSE errors 83
4.10 RMSE errors obtained by using the optimal BP network
and other methods cited by reference Chen et al. (2006) 85
5.1 The correlation between 4 factors and the total electricity demand 91
5.2 Cluster centroids of the clusters shown in Figure 5.1 93
5.3 Values of numbers of neurons in the hidden layer of the BP network
in the cluster 3 shown in Table 5.2 94
5.4 The coordinates of cluster centroids shown in Figure 5.2 95
5.5 The values encompassed within cluster 1 shown in Figure 5.2 96
XVIII
5.6 The result of using greedy algorithm to evaluate the MAPE cause by having
a certain number of neurons in the hidden layer of the BP network 96
5.7 The values of the MAPE when there are not more than
29 neurons in the hidden layer of the BP network 98
5.8 The individual MAPE values compared with the number of neurons
in the hidden layer of the BP network 102
5.9 Cluster centroids of the clusters shown in Figure 5.7 105
5.10 Values of numbers of neurons in the hidden layer of the BP network
in the cluster 3 106
5.11 The coordinates of cluster centroids shown in Figure 5.8 107
5.12 The values encompassed within cluster 1 shown in Figure 5.8 108
5.13 The result of using greedy algorithm to evaluate the MAPE
cau ed by having a certain number of neurons in the
hidden layer of the BP network 109
5.14 The individual MAPE values compared with the
number of neurons in the hidden layer of the BP network 114
5.15 Cluster centroids of the clusters shown in Figure 5.16 122
5.16 Values of numbers of neurons in the hidden layer of the
BP network in the cluster 3 123
5.17 The coordinates of cluster centroids shown in Figure 5.17 124
5.18 The values encompassed within cluster I shown in Figure 5.17 125
S.l9 The result of using greedy algorithm to evaluate the MAPE caused by having
125a certain number of neurons in the hidden layer of the BP network
XIX
5.20 MAPE achieved using the number of neurons in the hidden layer
of the BP network 128
5.21 Cluster centroids of the clusters shown in Figure 5.22 133
5.22 Value of numbers of neurons in the hidden layer of the BP network in
the cluster 3 shown in Table 5.21 134
5.23 The coordinates of cluster centroids shown in Figure 5.23 135
5.24 The values encompassed within cluster 2 shown in Figure 5.23 136
5.25 The result of using greedy algorithm to evaluate the MAPE caused by
having a certain number of neurons in the hidden layer of the BP network 137
5.26 The MAPE achieved using the BP network optimized by
the proposed KGA model 141
5.27 A comparison between the actual and predicted total electricity demand
for the years 2000 through 2005 using the regression model
described using Equation (5.6) 142
5.28 A comparison between the actual and predicted total electricity demand
for the years 2000 through 2005 using several different methods 143
5.29 The MAPE errors achieved by the optimal BP networks as well as
the MAPE errors achieved using the regression model 144
xx
Abbreviation
ANN
BP
CPI
GOP
KGA
MAPE
MSE
RMSE
LIST OF ABBREVIATIONS
Description
Artifical Neural Network
Multilayer Perceptron with Backpropagation Network
Consumer Price Index
Gross Domestic Product
K-means-Greedy Algorithm
Mean Absolute Percentage Error
Mean Squared Error
Root Mean Squared Error
xxi
CHAPTER!
INTRODUCTION
1.1 INTRODUCTION
Artificial neural networks (ANN) are infonnation processing tools inspired by the way
that a human brain works. ANNs have been successfully implemented to solve various tasks
in recent years, such as in forecasting of electricity demand, (Hippert et aI., 200 I; AI-Shareef
et aI., 2008) character and image recognition, credit evaluation, insurance, (Huang, 2009)
pattern recognition and classification, (Karayiannis and Behnke, 2000; Pham et aI., 2006a),
and also for daily water level estimation (Bustami et aI., 2006). Numerous types of ANNs
have been developed over the years, such as, radial basis network (RBF) (Chen et aI. , 1991),
Kohonen's self-organizing map (SOM) (Kohonen, 1990) and perceptron networks
(Rosenblatt, 1958). Each of these types has its own set of strengths and weaknesses, and these
networks are suitable in solving certain types of problems. For instance, RBF network models
are suitable in applications that contain a lot of training data, since it takes less time to be
trained (Hagan et al. 1996; Bong and Tan, 2007). The most popular type of network is, the
multilayer perceptron with backpropagation algorithm (BP) network (Rumelhart et aI. , 1986)
that was developed as a method to overcome the inability of perceptrons to solve problems
t are not linearly separable (Huang, 2009). It consists of one or more layers of neurons,
own as hidden layers, sandwiched between an input layer that relays data from the external
ironment to the network and the output layer that displays the results of the infonnation
essed by the entire network.
The SP network is a popular tool among researchers due to its ability to be easily
generated and to generalize relatively well (Zhang et aI. , 1998), in addition to its relative
aightforward manner in implementation (Kumar, 2004). The main drawback of using BP
networks is that while it is easy to create and train a SP network, it is difficult to obtain the
SP network with appropriate size and parameters that will provide the most accurate results
for the problem at hand. This is particularly true when it comes to determining the number of
hidden layers and the number of neurons in each existing hidden layer in the BP network. In
fact, the number of the hidden layers and the number of neurons in each hidden layer is so
important that these layers exert a lot of influence on the final output and hence on the
ork performance. This is because the BP network must have enough neurons in the
hidden layer in order to form a decision region that has the complexity as that required by the
problem at hand in order to solve the problem with desirable results (Kumar, 2004).
Modifying the number of neurons in the hidden layer of the BP network will have a
profound impact on the processing of information that the network receives. It has been
vered that the network does not train well if there are not enough neurons in the hidden
during training. Insufficient number of hidden neurons may lead to bad error tolerance
JDd poor generalization due to inadequate detection of the patterns underlying the data given
the SP network (Mehta and Gohel, 2005). On the other hand, the time taken by the SP
.lIdWOlik to learn will increase when there are too many neurons in the hidden layer of the BP
rk (Zheng and He, 2004). In addition to that, the SP network will simply memorize the
·c••t t that is given to it during training. As a result, the BP network is able to perfonn well
producing the expected outputs when it is given data that is exactly the same as the ones
2
--------------------------------------------------
used during training, but produces very poor results when it is given data that is different from
ones that is used to train it (Reed, 1991).
This problem of finding the optimal number of neurons in the hidden layer of a BP
lICtWork has grave impact on the overall design of the BP network, and its suitability to so.ve
problem at hand. Firstly, since the number of neurons in the hidden layer is not optimal,
. means that the BP network is not at its full potential to analyze and solve the problem, as
decision region fonned by the network is not at the required complexity to solve the task
bane!. This also means that the results that are produced by the networks are not the best
possible, since better results could be obtained from the network. Thus not only this
s that the BP network is poorly designed, it also shows that BP networks, and ANN in
tications. In fact, Hippert et a1. (2001) noted that the poor design of the number of neurons
the hidden layer is the main reason that researchers are not entirely convinced of the
ility of the ANN in forecasting although the implementation of ANN as forecasting
. promising, and much work needs to be done before the ANN can be accepted as
NlIDdIrd forecasting tools. This view is also supported by Adya and Collopy (1998) .
In view of the impact of the problem of optimizing the size of the hidden layer in the
1P1~lI/'nl1c several optimization methods have been proposed by researchers in recent years.
methods can be broadly classified into 3 groups, namely pruning-based methods that
parameters inside the BP network that do not contribute to better results (Sietsma
1988; Reed, 1991), methods based on exploitation of statistical know ledge of the
such as the model developed by Salazar Aguilar et a1. (2006), and methods
3