research article levenberg-marquardt algorithm for mackey...
TRANSCRIPT
Research ArticleLevenberg-Marquardt Algorithm for Mackey-GlassChaotic Time Series Prediction
Junsheng Zhao1 Yongmin Li23 Xingjiang Yu1 and Xingfang Zhang1
1 School of Mathematics Liaocheng University Liaocheng 252059 China2 School of Science Huzhou University Huzhou 313000 China3 School of Automation Southeast University Nanjing 210096 China
Correspondence should be addressed to Junsheng Zhao zhaojunshshao163com
Received 9 August 2014 Accepted 11 October 2014 Published 11 November 2014
Academic Editor Rongni Yang
Copyright copy 2014 Junsheng Zhao et alThis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
For decades Mackey-Glass chaotic time series prediction has attracted more and more attention When the multilayer perceptronis used to predict the Mackey-Glass chaotic time series what we should do is to minimize the loss function As is well knownthe convergence speed of the loss function is rapid in the beginning of the learning process while the convergence speed is veryslow when the parameter is near to the minimum point In order to overcome these problems we introduce the Levenberg-Marquardt algorithm (LMA) Firstly a rough introduction is given to the multilayer perceptron including the structure and themodel approximation method Secondly we introduce the LMA and discuss how to implement the LMA Lastly an illustrativeexample is carried out to show the prediction efficiency of the LMA Simulations show that the LMA can give more accurateprediction than the gradient descent method
1 Introduction
The Mackey-Glass chaotic time series is generated by thefollowing nonlinear time delay differential equation
119889119909 (119905)
119889119905=
120573119909 (119905 minus 120591)
1 + 119909119899 (119905 minus 120591)+ 120574119909 (119905) (1)
where 120573 120574 120591 and 119899 are real numbers Depending on thevalues of the parameters this equation displays a rangeof periodic and chaotic dynamics Such a series has someshort-range time coherence but long-term prediction is verydifficult
Originally Mackey and Glass proposed the followingequation to illustrate the appearance of complex dynamics inphysiological control systems by way of bifurcations in thedynamics
119889119909
119889119905=
120573119909120591
1 + 119909119899120591
minus 120574119909 120574 120573 119899 gt 0 (2)
They suggested that many physiological disorders calleddynamical diseases were characterized by changes in qual-itative features of dynamics The qualitative changes of phys-iological dynamics corresponded mathematically to bifur-cations in the dynamics of the system The bifurcationsin the equation dynamics could be induced by changes inthe parameters of the system as might arise from diseaseor environmental factors such as drugs or changes in thestructure of the system [1 2]
The Mackey-Glass equation has also had an impacton more rigorous mathematical studies of delay-differentialequations Methods for analysis of some of the propertiesof delay differential equations such as the existence ofsolutions and stability of equilibria and periodic solutionshad already been developed [3] However the existenceof chaotic dynamics in delay-differential equations wasunknown Subsequent studies of delay differential equationswith monotonic feedback have provided significant insightinto the conditions needed for oscillation and propertiesof oscillations [4ndash6] For delay differential equations withnonmonotonic feedback mathematical analysis has proven
Hindawi Publishing CorporationDiscrete Dynamics in Nature and SocietyVolume 2014 Article ID 193758 6 pageshttpdxdoiorg1011552014193758
2 Discrete Dynamics in Nature and Society
W
b1
V
bm
b998400
y
isin Rnx
Figure 1 Multilayer Perceptrons
much more difficult However rigorous proofs for chaoticdynamics have been obtained for the differential delayequation 119889119909119889119905 = 119892(119909(119905 minus 1)) for special classes of thefeedback function 119892 [7] Further although a proof of chaoticdynamics in the Mackey-Glass equation has still not beenfound advances in understanding the properties of delaydifferential equations is going on such as (2) that containboth exponential decay and nonmonotonic delayed feedback[8] The study of this equation remains a topic of vigorousresearch
The Mackey-Glass chaotic time series prediction is avery difficult task The aim is to predict the future state119909(119905 + Δ119879) using the current and the past time series 119909(119905)119909(119905 minus 1) 119909(119905 minus 119899) (Figure 2) Until now there are manyliteratures about the Mackey-Glass chaotic time series pre-diction [9ndash14] However as far as the prediction accuracy isconcerned most of the results in the literature are not ideal
In this paper we will predict the Mackey-Glass chaotictime series by the MLP While minimizing the loss functionwe introduce the LMA which can adjust the convergencespeed and obtain good convergence efficiency
The rest of the paper is organized as follows In Section 2we describe the multilayer perceptron Section 3 introducesthe LMA and discusses how to implement the LMA InSection 4 we give a numerical example to demonstratethe prediction efficiency Section 5 is the conclusions anddiscussions of the paper
2 Preliminaries
21 Multilayer Perceptrons A multilayer perceptron (MLP)is a feedforward artificial neural network model that mapssets of input data onto a set of appropriate outputs A MLPconsists of multiple layers of nodes in a directed graphwith each layer fully connected to the next one Exceptfor the input nodes each node is a neuron (or processingelement) with a nonlinear activation functionThemultilayerperceptron with only one hidden layer is depicted as inFigure 1 [15]
In Figure 1 x = [1199091 1199092 119909
119899]119879
isin 119877119899 is the model
input 119910 is the model output 119882 = 119908119894119895 119894 = 1 2 119899 119895 =
1 2 119898 is the connection weight from the 119909119894to the 119895th
hidden unit 119881 = V119895 is the connection weight from the 119895th
0 100 200 300 400 50004
05
06
07
08
09
1
11
12
13
14
Figure 2 Mackey-Glass chaotic time series
hidden unit to the output unit 119887119895 119895 = 1 2 119898 and 119887
1015840 arethe bias
The output of the multilayer perceptron described inFigure 1 is
119910 =
119898
sum
119895=1
V119895ℎ119895+ 1198871015840
(3)
and the outputs of the hidden units are
ℎ119895= 120593(
119899
sum
119894=1
119908119894119895119909119894+ 119887119895) (4)
respectively where 120593(sdot) is the activation function We willadopt the sigmoid function 120593(119909) as the activation functionfor example
120593 (119909) =1
1 + 119890minus119909 (5)
and the derivative of the activation function with respect to 119909
is
1205931015840
(119909) =119890minus119909
(1 + 119890minus119909)2 (6)
or
1205931015840
(119909) = 120593 (119909) (1 minus 120593 (119909)) (7)
MLP provides a universal method for function approxi-mation and classification [16 17] In the case of the functionapproximation we have a number of observed data (x
1 1199101)
(x2 1199102) (x
119871 119910119871) which are supposed to be generated by
119910 = 1198910(x) + 120585 (8)
where 120585 is noise usually subject to Gaussian distributionwith zero mean and 119891
0(x) is the unknown true generating
function
Discrete Dynamics in Nature and Society 3
Given a set of observed data sometimes called trainingexamples we search for the parameters
120579 = (1198871 119887
119898 11990811 119908
1198991 11990812 119908
1198992 119908
1119898
119908119899119898
1198871015840
V1 V
119898) isin 119877(119899+1)119898+119898+1
(9)
to approximate the teacher function 1198910(x) best where 119879
denotes the matrix transposition A satisfactory model isoften obtained by minimizing the mean square error
One of the serious problems in minimizing the meansquare error is that the convergence speed of the loss functionis rapid in the beginning of the learning process whilethe convergence speed is very slow in the region of theminimum [18] In order to overcome these problems we willintroduce the Levenberg-Marquardt algorithm (LMA) in thenext section
22 The Levenberg-Marquardt Algorithm In mathematicsand computing the Levenberg-Marquardt algorithm (LMA)[18ndash20] also known as the damped least-squares (DLS)method is used to solve nonlinear least squares problemsTheseminimization problems arise especially in least squarescurve fitting
The LMA is interpolates between the Gauss-Newtonalgorithm (GNA) and the gradient descent algorithm (GDA)As far as the robustness is concerned the LMA performsbetter than the GNA which means that in many cases it findsa solution even if it starts very far away from the minimumHowever for well-behaved functions and reasonable startingparameters the LMA tends to be a bit slower than the GNA
In many real applications for solving model fitting prob-lems we often adopt the LMA However like many otherfitting algorithms the LMA finds only a local minimumwhich is always not the global minimum
The least squares curve fitting problem is described asfollows Instead of the unknown true model a set of 119873 pairsof independent variables (x
1 1199101) (x2 1199102) (x
119873 119910119873) are
given Suppose that 119891(119909 120579) is the approximation model and119871(120579) is a loss function which is the sum of the squares of thedeviations
119871 (120579) =
119873
sum
119894=1
[119910119894minus 119891 (x
119894 120579)]2
(10)
The task of curve fitting problem isminimizing the above lossfunction 119871(120579) [21]
The LMA is an iterative algorithm and the parameter120579 is adjusted in each iteration step Generally speaking wechoose an initial parameter randomly for example 120579
119894sim
119880(minus1 1) 119894 = 1 2 119899 where 119899 is the dimension ofparameter 120579
The Taylor expansion of the function 119891(x119894 120579 + Δ120579) is
119891 (x119894 120579 + Δ120579) asymp 119891 (x
119894 120579) +
120597119891 (x119894 120579)
120597120579Δ120579 (11)
As we know at the minimum 120579lowast of loss function 119871(120579) the
gradient of 119871(120579) with respect to 120579 will be zero Substituting(11) into (10) we can obtain
119871 (120579 + Δ120579) asymp
119873
sum
119894=1
(119910119894minus 119891 (x
119894 120579) minus J
119894Δ120579)2
(12)
where 119869119894= 120597119891(x
119894 120579)120597120579 or
119871 (120579 + Δ120579) asymp1003817100381710038171003817y minus f(120579) minus JΔ1205791003817100381710038171003817
2
(13)
Taking the derivative with respect toΔ120579 and setting the resultto zero give
(J119879J) Δ120579 = J119879 (y minus f (120579)) (14)
where J is the Jacobian matrix whose 119894th row equals 119869119894and
also y and f are vectors with 119894th component 119891(119909119894 120579) and 119910
119894
respectively This is a set of linear equations which can besolved for Δ120579
Levenbergrsquos contribution is to replace this equation by aldquodamped versionrdquo
(J119879J + 120582119868) Δ120579 = J119879 [y minus f (120579)] (15)
where 119868 is the identity matrix giving the increment Δ120579 to theestimated parameter vector 120579
The damping factor 120582 is adjusted at each iteration step Ifthe loss function 119871(120579) reduces rapidly 120582 will adopt a smallvalue and then the LMA is similar to the Gauss-Newtonalgorithm While the loss function 119871(120579) reduces very slowly120582 can be increased giving a step closer to the gradient descentdirection and
120597119871 (120579)
120597Δ120579= minus2J119879 [y minus f (120579)]119879 (16)
Therefore for large values of 120582 the step will be takenapproximately in the direction of the gradient
In the process of iteration if either the length of thecalculated step Δ120579 or the reduction of 119871(120579) from the latestparameter vector 120579 + Δ120579 falls below the predefined limitsiteration process stops and then we take the last parametervector 120579 as the final solution
Levenbergrsquos algorithm has the disadvantage that if thevalue of damping factor 120582 is large the inverse of J119879J + 120582119868
does not work at all Marquardt provided the insight thatwe can scale each component of the gradient according tothe curvature so that there is larger movement along thedirections where the gradient is smaller This avoids slowconvergence in the direction of small gradient ThereforeMarquardt replaced the identity matrix 119868 with the diagonalmatrix consisting of the diagonal elements of J119879J resulting inthe Levenberg-Marquardt algorithm [22]
[J119879J + 120582 diag (J119879J)] Δ120579 = J119879 [y minus f (120579)] (17)
and the LMA is as follows
120579119905+1
= 120579119905+ Δ120579 (18)
where
Δ120579 = [J119879J + 120582 diag (J119879J) ]minus1
J119879 [y minus f (120579)] (19)
4 Discrete Dynamics in Nature and Society
3 Application of LMA for Mackey-GlassChaotic Time Series
In this section we will derive the LMAwhen theMLP is usedfor the Mackey-Glass chaotic time series prediction Supposethat we use the 119909(119905) minus Δ119879
0 119909(119905 minus Δ119879
1) 119909(119905 minus Δ119879
119899minus1) to
predict the future variable 119909(119905 + Δ119879) where Δ1198790= 0
To implement the LMA what we should do is calculatethe Jacobian matrix J whose 119894th row equals 119869
119894 According to
(3) and (4) function 119891(x 120579) can be expressed as
119891 (x 120579) =119898
sum
119895=1
V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895) + 1198871015840
(20)
What we should do is calculate the 119869119894= 120597119891(x
119894 120579)120597120579
The derivatives of 119891(x 120579) with respect to 120579 are
120597119891 (x 120579)120597119887119895
= V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
times [1 minus 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)]
120597119891 (x 120579)120597119908119894119895
= V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
times [1 minus 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)]
times 119909 (119905 minus Δ119879119894minus1
)
120597119891 (x 120579)1205971198871015840
= 1
120597119891 (x 120579)120597V119895
= 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
119894 = 1 2 119899 119895 = 1 2 119898
(21)
As we know 120579 = (1198871 119887
119898 11990811 119908
1198991 11990812
1199081198992 119908
1119898 119908
119899119898 1198871015840
V1 V
119898) isin 119877(119899+1)119898+119898+1 so 119869
119894 the
119894th row of J can be easily obtained according to (21) J iscalculated and when MLP is used for Mackey-Glass chaotictime series prediction the LMA can also be obtained
4 Numerical Simulations
Example 1 We will conduct an experiment to show theefficiency of the Levenberg-Marquardt algorithmWe choosea chaotic time series created by the Mackey-Glass delay-difference equation
119889119909 (119905)
119889119905=
02119909 (119905 minus 120591)
1 + 11990910 (119905 minus 120591)minus 01119909 (119905) (22)
for 120591 = 17Such a series has some short-range time coherence but
long-term prediction is very difficult The need to predict
such a time series arises in detecting arrhythmias in heart-beats
The network is given no information about the generatorof the time series and is asked to predict the future of the timeseries from a few samples of the history of the time series Inour example we trained the network to predict the value attime 119879+Δ119879 from inputs at time 119879 119879minus 6 119879minus 12 and 119879minus 18and we will adopt Δ119879 = 50 here
In the simulation 3000 training examples and 500 testexamples are generated by (22) We use the following mul-tilayer perceptron for fitting the generated training examples
119909 (119905 + 50) =
20
sum
119895=1
119881119895120593(
4
sum
119894=1
119909 (119905 minus 6 (119894 minus 1)) 119908119894119895+ 119887119895) + 1198871015840
(23)
for example the number of the hidden units is 119899 = 20 and thedimension of the input is119898 = 4
Let 119891(x 120579) = sum20
119895=1119881119895120593(sum4
119894=1119909(119905 minus 6(119894 minus 1))119908
119894119895+ 119887119895) + 1198871015840
then we can obtain the following equation according to (21)
120597119891 (x 120579)120597119887119895
= V119895120593(
20
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)
times [1 minus 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)]
120597119891 (x 120579)120597119908119894119895
= V119895120593(
20
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) + 119887
119895)
times [1 minus 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)]
times 119909 (119905 minus 6 times (119894 minus 1)119908119894119895+ 119887119895)
120597119891 (x 120579)1205971198871015840
= 1
120597119891 (x 120579)120597V119895
= 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)
119894 = 1 2 20 119895 = 1 2 3 4
(24)
The initial values of the parameters are selected randomly
119882119894119895sim 119880 (2 4) 119881
119895sim 119880 (1 2)
119894 = 1 2 21 119895 = 1 2 3 4 5
(25)
The learning curves of the error function and the fittingresult of LMA and GDA are shown in Figures 3 4 5 and 6respectively
The learning curves of LMA and GNA are shown inFigures 3 and 4 respectively The training error sum
3000
119894=1(119910119894minus
119891(x119894 120579))2 of LMA can reach 01 while the final training error
of GDA is more than 90 Furthermore the final mean testerror (1500)sum5500
119894=5001(119910119894minus119891(x119894 120579))2
= 00118 of LMA ismuchsmaller than 02296 which is the final test error of GDA
Discrete Dynamics in Nature and Society 5
10 20 30 40 50 60 70 80 90 10050
100
150
200
250
300
350
400
Training number
Trai
ning
erro
r
Figure 3 The GDA learning curve of the error function
10 20 30 40 50 60 70 80 90 1000
01
02
03
04
05
06
07
08
09
1
Training number
Trai
ning
erro
r
Figure 4 The LMA learning curve of the error function
As far as the fitting effect is concerned the performanceof LMA is much better than that of the GDA This is veryobvious from Figures 5 and 6
All of these suggest that when we predict the Mackey-Glass chaotic time series the performance of LMA is verygood It can effectively overcome the difficulties which mayarise in the GDA
5 Conclusions and Discussions
In this paper we discussed the application of the Levenberg-Marquardt algorithm for the Mackey-Glass chaotic timeseries prediction We used the multilayer perceptron with 20
hidden units to approximate and predict the Mackey-Glasschaotic time series In the process of minimizing the errorfunction we adopted the Levenberg-Marquardt algorithmIf reduction of 119871(120579) is rapid a smaller value damping factor120582 can be used bringing the algorithm closer to the Gauss-Newton algorithm whereas if an iteration gives insufficient
5000 5100 5200 5300 5400 550002
04
06
08
1
12
14
16
Model outputTestSamOut
Figure 5 Fitting of Mackey-Glass chaotic time series GDA
5000 5100 5200 5300 5400 550004
05
06
07
08
09
1
11
12
13
14
Model outputTestSamOut
Figure 6 Fitting of Mackey-Glass chaotic time series LMA
reduction in the residual 120582 can be increased giving astep closer to the gradient descent direction In this paperthe learning mode is batch At last we demonstrate theperformance of the LMA Simulations show that the LMAcanachieve much better prediction efficiency than the gradientdescent method
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
6 Discrete Dynamics in Nature and Society
Acknowledgment
This project is supported by the National Natural ScienceFoundation of China under Grants 61174076 11471152 and61403178
References
[1] M C Mackey and L Glass ldquoOscillation and chaos in physio-logical control systemsrdquo Science vol 197 no 4300 pp 287ndash2891977
[2] L Glass and M C Mackey ldquoMackey-Glass equationrdquo Scholar-pedia vol 5 no 3 p 6908 2010
[3] J Hale Theory of Functional Differential Equations SpringerNew York NY USA 1977
[4] J Mallet-Paret and R D Nussbaum ldquoA differential-delayequation arising in optics and physiologyrdquo SIAM Journal onMathematical Analysis vol 20 no 2 pp 249ndash292 1989
[5] H OWalther ldquoThe 2-dimensional attractor of 119889119909119889119905 = 120583119909 (119905)+
119891 (119909 (119905 minus 1))rdquo Memoirs of the American Mathematical Societyvol 113 no 544 p 76 1995
[6] J Mallet-Paret and G R Sell ldquoThe Poincare-Bendixson theo-rem for monotone cyclic feedback systems with delayrdquo Journalof Differential Equations vol 125 no 2 pp 441ndash489 1996
[7] B Lani-Wayda and H-O Walther ldquoChaotic motion generatedby delayed negative feedback part II construction of nonlin-earitiesrdquoMathematische Nachrichten vol 180 pp 181ndash211 2000
[8] G Rost and J H Wu ldquoDomain-decomposition method for theglobal dynamics of delay differential equations with unimodalfeedbackrdquo Proceedings of The Royal Society of London Series AMathematical Physical and Engineering Sciences vol 463 no2086 pp 2655ndash2669 2007
[9] M Awad H Pomares I Rojas O Salameh and M HamdonldquoPrediction of time series using RBF neural networks a newapproach of clusteringrdquo International Arab Journal of Informa-tion Technology vol 6 no 2 pp 138ndash143 2009
[10] M Awad ldquoChaotic time series prediction using wavelet neuralnetworkrdquo Journal of Artificial Intelligence Theory and Applica-tion vol 1 no 3 pp 73ndash80 2010
[11] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 pp 23ndash33 2014
[12] A C Fowler ldquoRespiratory control and the onset of periodicbreathingrdquoMathematical Modelling of Natural Phenomena vol9 no 1 pp 39ndash57 2014
[13] N Wang M J Er and M Han ldquoGeneralized single-hiddenlayer feedforward networks for regression problemsrdquo IEEETransactions on Neural Networks and Learning Systems 2014
[14] N Wang ldquoA generalized ellipsoidal basis function based onlineself-constructing fuzzy neural networkrdquo Neural Processing Let-ters vol 34 no 1 pp 13ndash37 2011
[15] H K WeiTheory and Method of the Neural Networks Architec-ture Design National Defence Industry Press Beijing China2005
[16] H Wei and S-I Amari ldquoDynamics of learning near singulari-ties in radial basis function networksrdquoNeural Networks vol 21no 7 pp 989ndash1005 2008
[17] H Wei J Zhang F Cousseau T Ozeki and S-I AmarildquoDynamics of learning near singularities in layered networksrdquoNeural Computation vol 20 no 3 pp 813ndash843 2008
[18] K Levenberg ldquoA method for the solution of certain non-linearproblems in least squaresrdquo Quarterly of Applied Mathematicsvol 2 pp 164ndash168 1944
[19] M THagan andM BMenhaj ldquoTraining feedforward networkswith the Marquardt algorithmrdquo IEEE Transactions on NeuralNetworks vol 5 no 6 pp 989ndash993 1994
[20] W Guo H Wei J Zhao and K Zhang ldquoAveraged learn-ing equations of error-function-based multilayer perceptronsrdquoNeural Computing and Applications vol 25 no 3-4 pp 825ndash832 2014
[21] P E Gill and W Murray ldquoAlgorithms for the solution of thenonlinear least-squares problemrdquo SIAM Journal on NumericalAnalysis vol 15 no 5 pp 977ndash992 1978
[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo SIAM Journal on Applied Mathematicsvol 11 no 2 pp 431ndash441 1963
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
2 Discrete Dynamics in Nature and Society
W
b1
V
bm
b998400
y
isin Rnx
Figure 1 Multilayer Perceptrons
much more difficult However rigorous proofs for chaoticdynamics have been obtained for the differential delayequation 119889119909119889119905 = 119892(119909(119905 minus 1)) for special classes of thefeedback function 119892 [7] Further although a proof of chaoticdynamics in the Mackey-Glass equation has still not beenfound advances in understanding the properties of delaydifferential equations is going on such as (2) that containboth exponential decay and nonmonotonic delayed feedback[8] The study of this equation remains a topic of vigorousresearch
The Mackey-Glass chaotic time series prediction is avery difficult task The aim is to predict the future state119909(119905 + Δ119879) using the current and the past time series 119909(119905)119909(119905 minus 1) 119909(119905 minus 119899) (Figure 2) Until now there are manyliteratures about the Mackey-Glass chaotic time series pre-diction [9ndash14] However as far as the prediction accuracy isconcerned most of the results in the literature are not ideal
In this paper we will predict the Mackey-Glass chaotictime series by the MLP While minimizing the loss functionwe introduce the LMA which can adjust the convergencespeed and obtain good convergence efficiency
The rest of the paper is organized as follows In Section 2we describe the multilayer perceptron Section 3 introducesthe LMA and discusses how to implement the LMA InSection 4 we give a numerical example to demonstratethe prediction efficiency Section 5 is the conclusions anddiscussions of the paper
2 Preliminaries
21 Multilayer Perceptrons A multilayer perceptron (MLP)is a feedforward artificial neural network model that mapssets of input data onto a set of appropriate outputs A MLPconsists of multiple layers of nodes in a directed graphwith each layer fully connected to the next one Exceptfor the input nodes each node is a neuron (or processingelement) with a nonlinear activation functionThemultilayerperceptron with only one hidden layer is depicted as inFigure 1 [15]
In Figure 1 x = [1199091 1199092 119909
119899]119879
isin 119877119899 is the model
input 119910 is the model output 119882 = 119908119894119895 119894 = 1 2 119899 119895 =
1 2 119898 is the connection weight from the 119909119894to the 119895th
hidden unit 119881 = V119895 is the connection weight from the 119895th
0 100 200 300 400 50004
05
06
07
08
09
1
11
12
13
14
Figure 2 Mackey-Glass chaotic time series
hidden unit to the output unit 119887119895 119895 = 1 2 119898 and 119887
1015840 arethe bias
The output of the multilayer perceptron described inFigure 1 is
119910 =
119898
sum
119895=1
V119895ℎ119895+ 1198871015840
(3)
and the outputs of the hidden units are
ℎ119895= 120593(
119899
sum
119894=1
119908119894119895119909119894+ 119887119895) (4)
respectively where 120593(sdot) is the activation function We willadopt the sigmoid function 120593(119909) as the activation functionfor example
120593 (119909) =1
1 + 119890minus119909 (5)
and the derivative of the activation function with respect to 119909
is
1205931015840
(119909) =119890minus119909
(1 + 119890minus119909)2 (6)
or
1205931015840
(119909) = 120593 (119909) (1 minus 120593 (119909)) (7)
MLP provides a universal method for function approxi-mation and classification [16 17] In the case of the functionapproximation we have a number of observed data (x
1 1199101)
(x2 1199102) (x
119871 119910119871) which are supposed to be generated by
119910 = 1198910(x) + 120585 (8)
where 120585 is noise usually subject to Gaussian distributionwith zero mean and 119891
0(x) is the unknown true generating
function
Discrete Dynamics in Nature and Society 3
Given a set of observed data sometimes called trainingexamples we search for the parameters
120579 = (1198871 119887
119898 11990811 119908
1198991 11990812 119908
1198992 119908
1119898
119908119899119898
1198871015840
V1 V
119898) isin 119877(119899+1)119898+119898+1
(9)
to approximate the teacher function 1198910(x) best where 119879
denotes the matrix transposition A satisfactory model isoften obtained by minimizing the mean square error
One of the serious problems in minimizing the meansquare error is that the convergence speed of the loss functionis rapid in the beginning of the learning process whilethe convergence speed is very slow in the region of theminimum [18] In order to overcome these problems we willintroduce the Levenberg-Marquardt algorithm (LMA) in thenext section
22 The Levenberg-Marquardt Algorithm In mathematicsand computing the Levenberg-Marquardt algorithm (LMA)[18ndash20] also known as the damped least-squares (DLS)method is used to solve nonlinear least squares problemsTheseminimization problems arise especially in least squarescurve fitting
The LMA is interpolates between the Gauss-Newtonalgorithm (GNA) and the gradient descent algorithm (GDA)As far as the robustness is concerned the LMA performsbetter than the GNA which means that in many cases it findsa solution even if it starts very far away from the minimumHowever for well-behaved functions and reasonable startingparameters the LMA tends to be a bit slower than the GNA
In many real applications for solving model fitting prob-lems we often adopt the LMA However like many otherfitting algorithms the LMA finds only a local minimumwhich is always not the global minimum
The least squares curve fitting problem is described asfollows Instead of the unknown true model a set of 119873 pairsof independent variables (x
1 1199101) (x2 1199102) (x
119873 119910119873) are
given Suppose that 119891(119909 120579) is the approximation model and119871(120579) is a loss function which is the sum of the squares of thedeviations
119871 (120579) =
119873
sum
119894=1
[119910119894minus 119891 (x
119894 120579)]2
(10)
The task of curve fitting problem isminimizing the above lossfunction 119871(120579) [21]
The LMA is an iterative algorithm and the parameter120579 is adjusted in each iteration step Generally speaking wechoose an initial parameter randomly for example 120579
119894sim
119880(minus1 1) 119894 = 1 2 119899 where 119899 is the dimension ofparameter 120579
The Taylor expansion of the function 119891(x119894 120579 + Δ120579) is
119891 (x119894 120579 + Δ120579) asymp 119891 (x
119894 120579) +
120597119891 (x119894 120579)
120597120579Δ120579 (11)
As we know at the minimum 120579lowast of loss function 119871(120579) the
gradient of 119871(120579) with respect to 120579 will be zero Substituting(11) into (10) we can obtain
119871 (120579 + Δ120579) asymp
119873
sum
119894=1
(119910119894minus 119891 (x
119894 120579) minus J
119894Δ120579)2
(12)
where 119869119894= 120597119891(x
119894 120579)120597120579 or
119871 (120579 + Δ120579) asymp1003817100381710038171003817y minus f(120579) minus JΔ1205791003817100381710038171003817
2
(13)
Taking the derivative with respect toΔ120579 and setting the resultto zero give
(J119879J) Δ120579 = J119879 (y minus f (120579)) (14)
where J is the Jacobian matrix whose 119894th row equals 119869119894and
also y and f are vectors with 119894th component 119891(119909119894 120579) and 119910
119894
respectively This is a set of linear equations which can besolved for Δ120579
Levenbergrsquos contribution is to replace this equation by aldquodamped versionrdquo
(J119879J + 120582119868) Δ120579 = J119879 [y minus f (120579)] (15)
where 119868 is the identity matrix giving the increment Δ120579 to theestimated parameter vector 120579
The damping factor 120582 is adjusted at each iteration step Ifthe loss function 119871(120579) reduces rapidly 120582 will adopt a smallvalue and then the LMA is similar to the Gauss-Newtonalgorithm While the loss function 119871(120579) reduces very slowly120582 can be increased giving a step closer to the gradient descentdirection and
120597119871 (120579)
120597Δ120579= minus2J119879 [y minus f (120579)]119879 (16)
Therefore for large values of 120582 the step will be takenapproximately in the direction of the gradient
In the process of iteration if either the length of thecalculated step Δ120579 or the reduction of 119871(120579) from the latestparameter vector 120579 + Δ120579 falls below the predefined limitsiteration process stops and then we take the last parametervector 120579 as the final solution
Levenbergrsquos algorithm has the disadvantage that if thevalue of damping factor 120582 is large the inverse of J119879J + 120582119868
does not work at all Marquardt provided the insight thatwe can scale each component of the gradient according tothe curvature so that there is larger movement along thedirections where the gradient is smaller This avoids slowconvergence in the direction of small gradient ThereforeMarquardt replaced the identity matrix 119868 with the diagonalmatrix consisting of the diagonal elements of J119879J resulting inthe Levenberg-Marquardt algorithm [22]
[J119879J + 120582 diag (J119879J)] Δ120579 = J119879 [y minus f (120579)] (17)
and the LMA is as follows
120579119905+1
= 120579119905+ Δ120579 (18)
where
Δ120579 = [J119879J + 120582 diag (J119879J) ]minus1
J119879 [y minus f (120579)] (19)
4 Discrete Dynamics in Nature and Society
3 Application of LMA for Mackey-GlassChaotic Time Series
In this section we will derive the LMAwhen theMLP is usedfor the Mackey-Glass chaotic time series prediction Supposethat we use the 119909(119905) minus Δ119879
0 119909(119905 minus Δ119879
1) 119909(119905 minus Δ119879
119899minus1) to
predict the future variable 119909(119905 + Δ119879) where Δ1198790= 0
To implement the LMA what we should do is calculatethe Jacobian matrix J whose 119894th row equals 119869
119894 According to
(3) and (4) function 119891(x 120579) can be expressed as
119891 (x 120579) =119898
sum
119895=1
V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895) + 1198871015840
(20)
What we should do is calculate the 119869119894= 120597119891(x
119894 120579)120597120579
The derivatives of 119891(x 120579) with respect to 120579 are
120597119891 (x 120579)120597119887119895
= V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
times [1 minus 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)]
120597119891 (x 120579)120597119908119894119895
= V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
times [1 minus 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)]
times 119909 (119905 minus Δ119879119894minus1
)
120597119891 (x 120579)1205971198871015840
= 1
120597119891 (x 120579)120597V119895
= 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
119894 = 1 2 119899 119895 = 1 2 119898
(21)
As we know 120579 = (1198871 119887
119898 11990811 119908
1198991 11990812
1199081198992 119908
1119898 119908
119899119898 1198871015840
V1 V
119898) isin 119877(119899+1)119898+119898+1 so 119869
119894 the
119894th row of J can be easily obtained according to (21) J iscalculated and when MLP is used for Mackey-Glass chaotictime series prediction the LMA can also be obtained
4 Numerical Simulations
Example 1 We will conduct an experiment to show theefficiency of the Levenberg-Marquardt algorithmWe choosea chaotic time series created by the Mackey-Glass delay-difference equation
119889119909 (119905)
119889119905=
02119909 (119905 minus 120591)
1 + 11990910 (119905 minus 120591)minus 01119909 (119905) (22)
for 120591 = 17Such a series has some short-range time coherence but
long-term prediction is very difficult The need to predict
such a time series arises in detecting arrhythmias in heart-beats
The network is given no information about the generatorof the time series and is asked to predict the future of the timeseries from a few samples of the history of the time series Inour example we trained the network to predict the value attime 119879+Δ119879 from inputs at time 119879 119879minus 6 119879minus 12 and 119879minus 18and we will adopt Δ119879 = 50 here
In the simulation 3000 training examples and 500 testexamples are generated by (22) We use the following mul-tilayer perceptron for fitting the generated training examples
119909 (119905 + 50) =
20
sum
119895=1
119881119895120593(
4
sum
119894=1
119909 (119905 minus 6 (119894 minus 1)) 119908119894119895+ 119887119895) + 1198871015840
(23)
for example the number of the hidden units is 119899 = 20 and thedimension of the input is119898 = 4
Let 119891(x 120579) = sum20
119895=1119881119895120593(sum4
119894=1119909(119905 minus 6(119894 minus 1))119908
119894119895+ 119887119895) + 1198871015840
then we can obtain the following equation according to (21)
120597119891 (x 120579)120597119887119895
= V119895120593(
20
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)
times [1 minus 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)]
120597119891 (x 120579)120597119908119894119895
= V119895120593(
20
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) + 119887
119895)
times [1 minus 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)]
times 119909 (119905 minus 6 times (119894 minus 1)119908119894119895+ 119887119895)
120597119891 (x 120579)1205971198871015840
= 1
120597119891 (x 120579)120597V119895
= 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)
119894 = 1 2 20 119895 = 1 2 3 4
(24)
The initial values of the parameters are selected randomly
119882119894119895sim 119880 (2 4) 119881
119895sim 119880 (1 2)
119894 = 1 2 21 119895 = 1 2 3 4 5
(25)
The learning curves of the error function and the fittingresult of LMA and GDA are shown in Figures 3 4 5 and 6respectively
The learning curves of LMA and GNA are shown inFigures 3 and 4 respectively The training error sum
3000
119894=1(119910119894minus
119891(x119894 120579))2 of LMA can reach 01 while the final training error
of GDA is more than 90 Furthermore the final mean testerror (1500)sum5500
119894=5001(119910119894minus119891(x119894 120579))2
= 00118 of LMA ismuchsmaller than 02296 which is the final test error of GDA
Discrete Dynamics in Nature and Society 5
10 20 30 40 50 60 70 80 90 10050
100
150
200
250
300
350
400
Training number
Trai
ning
erro
r
Figure 3 The GDA learning curve of the error function
10 20 30 40 50 60 70 80 90 1000
01
02
03
04
05
06
07
08
09
1
Training number
Trai
ning
erro
r
Figure 4 The LMA learning curve of the error function
As far as the fitting effect is concerned the performanceof LMA is much better than that of the GDA This is veryobvious from Figures 5 and 6
All of these suggest that when we predict the Mackey-Glass chaotic time series the performance of LMA is verygood It can effectively overcome the difficulties which mayarise in the GDA
5 Conclusions and Discussions
In this paper we discussed the application of the Levenberg-Marquardt algorithm for the Mackey-Glass chaotic timeseries prediction We used the multilayer perceptron with 20
hidden units to approximate and predict the Mackey-Glasschaotic time series In the process of minimizing the errorfunction we adopted the Levenberg-Marquardt algorithmIf reduction of 119871(120579) is rapid a smaller value damping factor120582 can be used bringing the algorithm closer to the Gauss-Newton algorithm whereas if an iteration gives insufficient
5000 5100 5200 5300 5400 550002
04
06
08
1
12
14
16
Model outputTestSamOut
Figure 5 Fitting of Mackey-Glass chaotic time series GDA
5000 5100 5200 5300 5400 550004
05
06
07
08
09
1
11
12
13
14
Model outputTestSamOut
Figure 6 Fitting of Mackey-Glass chaotic time series LMA
reduction in the residual 120582 can be increased giving astep closer to the gradient descent direction In this paperthe learning mode is batch At last we demonstrate theperformance of the LMA Simulations show that the LMAcanachieve much better prediction efficiency than the gradientdescent method
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
6 Discrete Dynamics in Nature and Society
Acknowledgment
This project is supported by the National Natural ScienceFoundation of China under Grants 61174076 11471152 and61403178
References
[1] M C Mackey and L Glass ldquoOscillation and chaos in physio-logical control systemsrdquo Science vol 197 no 4300 pp 287ndash2891977
[2] L Glass and M C Mackey ldquoMackey-Glass equationrdquo Scholar-pedia vol 5 no 3 p 6908 2010
[3] J Hale Theory of Functional Differential Equations SpringerNew York NY USA 1977
[4] J Mallet-Paret and R D Nussbaum ldquoA differential-delayequation arising in optics and physiologyrdquo SIAM Journal onMathematical Analysis vol 20 no 2 pp 249ndash292 1989
[5] H OWalther ldquoThe 2-dimensional attractor of 119889119909119889119905 = 120583119909 (119905)+
119891 (119909 (119905 minus 1))rdquo Memoirs of the American Mathematical Societyvol 113 no 544 p 76 1995
[6] J Mallet-Paret and G R Sell ldquoThe Poincare-Bendixson theo-rem for monotone cyclic feedback systems with delayrdquo Journalof Differential Equations vol 125 no 2 pp 441ndash489 1996
[7] B Lani-Wayda and H-O Walther ldquoChaotic motion generatedby delayed negative feedback part II construction of nonlin-earitiesrdquoMathematische Nachrichten vol 180 pp 181ndash211 2000
[8] G Rost and J H Wu ldquoDomain-decomposition method for theglobal dynamics of delay differential equations with unimodalfeedbackrdquo Proceedings of The Royal Society of London Series AMathematical Physical and Engineering Sciences vol 463 no2086 pp 2655ndash2669 2007
[9] M Awad H Pomares I Rojas O Salameh and M HamdonldquoPrediction of time series using RBF neural networks a newapproach of clusteringrdquo International Arab Journal of Informa-tion Technology vol 6 no 2 pp 138ndash143 2009
[10] M Awad ldquoChaotic time series prediction using wavelet neuralnetworkrdquo Journal of Artificial Intelligence Theory and Applica-tion vol 1 no 3 pp 73ndash80 2010
[11] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 pp 23ndash33 2014
[12] A C Fowler ldquoRespiratory control and the onset of periodicbreathingrdquoMathematical Modelling of Natural Phenomena vol9 no 1 pp 39ndash57 2014
[13] N Wang M J Er and M Han ldquoGeneralized single-hiddenlayer feedforward networks for regression problemsrdquo IEEETransactions on Neural Networks and Learning Systems 2014
[14] N Wang ldquoA generalized ellipsoidal basis function based onlineself-constructing fuzzy neural networkrdquo Neural Processing Let-ters vol 34 no 1 pp 13ndash37 2011
[15] H K WeiTheory and Method of the Neural Networks Architec-ture Design National Defence Industry Press Beijing China2005
[16] H Wei and S-I Amari ldquoDynamics of learning near singulari-ties in radial basis function networksrdquoNeural Networks vol 21no 7 pp 989ndash1005 2008
[17] H Wei J Zhang F Cousseau T Ozeki and S-I AmarildquoDynamics of learning near singularities in layered networksrdquoNeural Computation vol 20 no 3 pp 813ndash843 2008
[18] K Levenberg ldquoA method for the solution of certain non-linearproblems in least squaresrdquo Quarterly of Applied Mathematicsvol 2 pp 164ndash168 1944
[19] M THagan andM BMenhaj ldquoTraining feedforward networkswith the Marquardt algorithmrdquo IEEE Transactions on NeuralNetworks vol 5 no 6 pp 989ndash993 1994
[20] W Guo H Wei J Zhao and K Zhang ldquoAveraged learn-ing equations of error-function-based multilayer perceptronsrdquoNeural Computing and Applications vol 25 no 3-4 pp 825ndash832 2014
[21] P E Gill and W Murray ldquoAlgorithms for the solution of thenonlinear least-squares problemrdquo SIAM Journal on NumericalAnalysis vol 15 no 5 pp 977ndash992 1978
[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo SIAM Journal on Applied Mathematicsvol 11 no 2 pp 431ndash441 1963
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Discrete Dynamics in Nature and Society 3
Given a set of observed data sometimes called trainingexamples we search for the parameters
120579 = (1198871 119887
119898 11990811 119908
1198991 11990812 119908
1198992 119908
1119898
119908119899119898
1198871015840
V1 V
119898) isin 119877(119899+1)119898+119898+1
(9)
to approximate the teacher function 1198910(x) best where 119879
denotes the matrix transposition A satisfactory model isoften obtained by minimizing the mean square error
One of the serious problems in minimizing the meansquare error is that the convergence speed of the loss functionis rapid in the beginning of the learning process whilethe convergence speed is very slow in the region of theminimum [18] In order to overcome these problems we willintroduce the Levenberg-Marquardt algorithm (LMA) in thenext section
22 The Levenberg-Marquardt Algorithm In mathematicsand computing the Levenberg-Marquardt algorithm (LMA)[18ndash20] also known as the damped least-squares (DLS)method is used to solve nonlinear least squares problemsTheseminimization problems arise especially in least squarescurve fitting
The LMA is interpolates between the Gauss-Newtonalgorithm (GNA) and the gradient descent algorithm (GDA)As far as the robustness is concerned the LMA performsbetter than the GNA which means that in many cases it findsa solution even if it starts very far away from the minimumHowever for well-behaved functions and reasonable startingparameters the LMA tends to be a bit slower than the GNA
In many real applications for solving model fitting prob-lems we often adopt the LMA However like many otherfitting algorithms the LMA finds only a local minimumwhich is always not the global minimum
The least squares curve fitting problem is described asfollows Instead of the unknown true model a set of 119873 pairsof independent variables (x
1 1199101) (x2 1199102) (x
119873 119910119873) are
given Suppose that 119891(119909 120579) is the approximation model and119871(120579) is a loss function which is the sum of the squares of thedeviations
119871 (120579) =
119873
sum
119894=1
[119910119894minus 119891 (x
119894 120579)]2
(10)
The task of curve fitting problem isminimizing the above lossfunction 119871(120579) [21]
The LMA is an iterative algorithm and the parameter120579 is adjusted in each iteration step Generally speaking wechoose an initial parameter randomly for example 120579
119894sim
119880(minus1 1) 119894 = 1 2 119899 where 119899 is the dimension ofparameter 120579
The Taylor expansion of the function 119891(x119894 120579 + Δ120579) is
119891 (x119894 120579 + Δ120579) asymp 119891 (x
119894 120579) +
120597119891 (x119894 120579)
120597120579Δ120579 (11)
As we know at the minimum 120579lowast of loss function 119871(120579) the
gradient of 119871(120579) with respect to 120579 will be zero Substituting(11) into (10) we can obtain
119871 (120579 + Δ120579) asymp
119873
sum
119894=1
(119910119894minus 119891 (x
119894 120579) minus J
119894Δ120579)2
(12)
where 119869119894= 120597119891(x
119894 120579)120597120579 or
119871 (120579 + Δ120579) asymp1003817100381710038171003817y minus f(120579) minus JΔ1205791003817100381710038171003817
2
(13)
Taking the derivative with respect toΔ120579 and setting the resultto zero give
(J119879J) Δ120579 = J119879 (y minus f (120579)) (14)
where J is the Jacobian matrix whose 119894th row equals 119869119894and
also y and f are vectors with 119894th component 119891(119909119894 120579) and 119910
119894
respectively This is a set of linear equations which can besolved for Δ120579
Levenbergrsquos contribution is to replace this equation by aldquodamped versionrdquo
(J119879J + 120582119868) Δ120579 = J119879 [y minus f (120579)] (15)
where 119868 is the identity matrix giving the increment Δ120579 to theestimated parameter vector 120579
The damping factor 120582 is adjusted at each iteration step Ifthe loss function 119871(120579) reduces rapidly 120582 will adopt a smallvalue and then the LMA is similar to the Gauss-Newtonalgorithm While the loss function 119871(120579) reduces very slowly120582 can be increased giving a step closer to the gradient descentdirection and
120597119871 (120579)
120597Δ120579= minus2J119879 [y minus f (120579)]119879 (16)
Therefore for large values of 120582 the step will be takenapproximately in the direction of the gradient
In the process of iteration if either the length of thecalculated step Δ120579 or the reduction of 119871(120579) from the latestparameter vector 120579 + Δ120579 falls below the predefined limitsiteration process stops and then we take the last parametervector 120579 as the final solution
Levenbergrsquos algorithm has the disadvantage that if thevalue of damping factor 120582 is large the inverse of J119879J + 120582119868
does not work at all Marquardt provided the insight thatwe can scale each component of the gradient according tothe curvature so that there is larger movement along thedirections where the gradient is smaller This avoids slowconvergence in the direction of small gradient ThereforeMarquardt replaced the identity matrix 119868 with the diagonalmatrix consisting of the diagonal elements of J119879J resulting inthe Levenberg-Marquardt algorithm [22]
[J119879J + 120582 diag (J119879J)] Δ120579 = J119879 [y minus f (120579)] (17)
and the LMA is as follows
120579119905+1
= 120579119905+ Δ120579 (18)
where
Δ120579 = [J119879J + 120582 diag (J119879J) ]minus1
J119879 [y minus f (120579)] (19)
4 Discrete Dynamics in Nature and Society
3 Application of LMA for Mackey-GlassChaotic Time Series
In this section we will derive the LMAwhen theMLP is usedfor the Mackey-Glass chaotic time series prediction Supposethat we use the 119909(119905) minus Δ119879
0 119909(119905 minus Δ119879
1) 119909(119905 minus Δ119879
119899minus1) to
predict the future variable 119909(119905 + Δ119879) where Δ1198790= 0
To implement the LMA what we should do is calculatethe Jacobian matrix J whose 119894th row equals 119869
119894 According to
(3) and (4) function 119891(x 120579) can be expressed as
119891 (x 120579) =119898
sum
119895=1
V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895) + 1198871015840
(20)
What we should do is calculate the 119869119894= 120597119891(x
119894 120579)120597120579
The derivatives of 119891(x 120579) with respect to 120579 are
120597119891 (x 120579)120597119887119895
= V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
times [1 minus 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)]
120597119891 (x 120579)120597119908119894119895
= V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
times [1 minus 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)]
times 119909 (119905 minus Δ119879119894minus1
)
120597119891 (x 120579)1205971198871015840
= 1
120597119891 (x 120579)120597V119895
= 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
119894 = 1 2 119899 119895 = 1 2 119898
(21)
As we know 120579 = (1198871 119887
119898 11990811 119908
1198991 11990812
1199081198992 119908
1119898 119908
119899119898 1198871015840
V1 V
119898) isin 119877(119899+1)119898+119898+1 so 119869
119894 the
119894th row of J can be easily obtained according to (21) J iscalculated and when MLP is used for Mackey-Glass chaotictime series prediction the LMA can also be obtained
4 Numerical Simulations
Example 1 We will conduct an experiment to show theefficiency of the Levenberg-Marquardt algorithmWe choosea chaotic time series created by the Mackey-Glass delay-difference equation
119889119909 (119905)
119889119905=
02119909 (119905 minus 120591)
1 + 11990910 (119905 minus 120591)minus 01119909 (119905) (22)
for 120591 = 17Such a series has some short-range time coherence but
long-term prediction is very difficult The need to predict
such a time series arises in detecting arrhythmias in heart-beats
The network is given no information about the generatorof the time series and is asked to predict the future of the timeseries from a few samples of the history of the time series Inour example we trained the network to predict the value attime 119879+Δ119879 from inputs at time 119879 119879minus 6 119879minus 12 and 119879minus 18and we will adopt Δ119879 = 50 here
In the simulation 3000 training examples and 500 testexamples are generated by (22) We use the following mul-tilayer perceptron for fitting the generated training examples
119909 (119905 + 50) =
20
sum
119895=1
119881119895120593(
4
sum
119894=1
119909 (119905 minus 6 (119894 minus 1)) 119908119894119895+ 119887119895) + 1198871015840
(23)
for example the number of the hidden units is 119899 = 20 and thedimension of the input is119898 = 4
Let 119891(x 120579) = sum20
119895=1119881119895120593(sum4
119894=1119909(119905 minus 6(119894 minus 1))119908
119894119895+ 119887119895) + 1198871015840
then we can obtain the following equation according to (21)
120597119891 (x 120579)120597119887119895
= V119895120593(
20
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)
times [1 minus 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)]
120597119891 (x 120579)120597119908119894119895
= V119895120593(
20
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) + 119887
119895)
times [1 minus 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)]
times 119909 (119905 minus 6 times (119894 minus 1)119908119894119895+ 119887119895)
120597119891 (x 120579)1205971198871015840
= 1
120597119891 (x 120579)120597V119895
= 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)
119894 = 1 2 20 119895 = 1 2 3 4
(24)
The initial values of the parameters are selected randomly
119882119894119895sim 119880 (2 4) 119881
119895sim 119880 (1 2)
119894 = 1 2 21 119895 = 1 2 3 4 5
(25)
The learning curves of the error function and the fittingresult of LMA and GDA are shown in Figures 3 4 5 and 6respectively
The learning curves of LMA and GNA are shown inFigures 3 and 4 respectively The training error sum
3000
119894=1(119910119894minus
119891(x119894 120579))2 of LMA can reach 01 while the final training error
of GDA is more than 90 Furthermore the final mean testerror (1500)sum5500
119894=5001(119910119894minus119891(x119894 120579))2
= 00118 of LMA ismuchsmaller than 02296 which is the final test error of GDA
Discrete Dynamics in Nature and Society 5
10 20 30 40 50 60 70 80 90 10050
100
150
200
250
300
350
400
Training number
Trai
ning
erro
r
Figure 3 The GDA learning curve of the error function
10 20 30 40 50 60 70 80 90 1000
01
02
03
04
05
06
07
08
09
1
Training number
Trai
ning
erro
r
Figure 4 The LMA learning curve of the error function
As far as the fitting effect is concerned the performanceof LMA is much better than that of the GDA This is veryobvious from Figures 5 and 6
All of these suggest that when we predict the Mackey-Glass chaotic time series the performance of LMA is verygood It can effectively overcome the difficulties which mayarise in the GDA
5 Conclusions and Discussions
In this paper we discussed the application of the Levenberg-Marquardt algorithm for the Mackey-Glass chaotic timeseries prediction We used the multilayer perceptron with 20
hidden units to approximate and predict the Mackey-Glasschaotic time series In the process of minimizing the errorfunction we adopted the Levenberg-Marquardt algorithmIf reduction of 119871(120579) is rapid a smaller value damping factor120582 can be used bringing the algorithm closer to the Gauss-Newton algorithm whereas if an iteration gives insufficient
5000 5100 5200 5300 5400 550002
04
06
08
1
12
14
16
Model outputTestSamOut
Figure 5 Fitting of Mackey-Glass chaotic time series GDA
5000 5100 5200 5300 5400 550004
05
06
07
08
09
1
11
12
13
14
Model outputTestSamOut
Figure 6 Fitting of Mackey-Glass chaotic time series LMA
reduction in the residual 120582 can be increased giving astep closer to the gradient descent direction In this paperthe learning mode is batch At last we demonstrate theperformance of the LMA Simulations show that the LMAcanachieve much better prediction efficiency than the gradientdescent method
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
6 Discrete Dynamics in Nature and Society
Acknowledgment
This project is supported by the National Natural ScienceFoundation of China under Grants 61174076 11471152 and61403178
References
[1] M C Mackey and L Glass ldquoOscillation and chaos in physio-logical control systemsrdquo Science vol 197 no 4300 pp 287ndash2891977
[2] L Glass and M C Mackey ldquoMackey-Glass equationrdquo Scholar-pedia vol 5 no 3 p 6908 2010
[3] J Hale Theory of Functional Differential Equations SpringerNew York NY USA 1977
[4] J Mallet-Paret and R D Nussbaum ldquoA differential-delayequation arising in optics and physiologyrdquo SIAM Journal onMathematical Analysis vol 20 no 2 pp 249ndash292 1989
[5] H OWalther ldquoThe 2-dimensional attractor of 119889119909119889119905 = 120583119909 (119905)+
119891 (119909 (119905 minus 1))rdquo Memoirs of the American Mathematical Societyvol 113 no 544 p 76 1995
[6] J Mallet-Paret and G R Sell ldquoThe Poincare-Bendixson theo-rem for monotone cyclic feedback systems with delayrdquo Journalof Differential Equations vol 125 no 2 pp 441ndash489 1996
[7] B Lani-Wayda and H-O Walther ldquoChaotic motion generatedby delayed negative feedback part II construction of nonlin-earitiesrdquoMathematische Nachrichten vol 180 pp 181ndash211 2000
[8] G Rost and J H Wu ldquoDomain-decomposition method for theglobal dynamics of delay differential equations with unimodalfeedbackrdquo Proceedings of The Royal Society of London Series AMathematical Physical and Engineering Sciences vol 463 no2086 pp 2655ndash2669 2007
[9] M Awad H Pomares I Rojas O Salameh and M HamdonldquoPrediction of time series using RBF neural networks a newapproach of clusteringrdquo International Arab Journal of Informa-tion Technology vol 6 no 2 pp 138ndash143 2009
[10] M Awad ldquoChaotic time series prediction using wavelet neuralnetworkrdquo Journal of Artificial Intelligence Theory and Applica-tion vol 1 no 3 pp 73ndash80 2010
[11] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 pp 23ndash33 2014
[12] A C Fowler ldquoRespiratory control and the onset of periodicbreathingrdquoMathematical Modelling of Natural Phenomena vol9 no 1 pp 39ndash57 2014
[13] N Wang M J Er and M Han ldquoGeneralized single-hiddenlayer feedforward networks for regression problemsrdquo IEEETransactions on Neural Networks and Learning Systems 2014
[14] N Wang ldquoA generalized ellipsoidal basis function based onlineself-constructing fuzzy neural networkrdquo Neural Processing Let-ters vol 34 no 1 pp 13ndash37 2011
[15] H K WeiTheory and Method of the Neural Networks Architec-ture Design National Defence Industry Press Beijing China2005
[16] H Wei and S-I Amari ldquoDynamics of learning near singulari-ties in radial basis function networksrdquoNeural Networks vol 21no 7 pp 989ndash1005 2008
[17] H Wei J Zhang F Cousseau T Ozeki and S-I AmarildquoDynamics of learning near singularities in layered networksrdquoNeural Computation vol 20 no 3 pp 813ndash843 2008
[18] K Levenberg ldquoA method for the solution of certain non-linearproblems in least squaresrdquo Quarterly of Applied Mathematicsvol 2 pp 164ndash168 1944
[19] M THagan andM BMenhaj ldquoTraining feedforward networkswith the Marquardt algorithmrdquo IEEE Transactions on NeuralNetworks vol 5 no 6 pp 989ndash993 1994
[20] W Guo H Wei J Zhao and K Zhang ldquoAveraged learn-ing equations of error-function-based multilayer perceptronsrdquoNeural Computing and Applications vol 25 no 3-4 pp 825ndash832 2014
[21] P E Gill and W Murray ldquoAlgorithms for the solution of thenonlinear least-squares problemrdquo SIAM Journal on NumericalAnalysis vol 15 no 5 pp 977ndash992 1978
[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo SIAM Journal on Applied Mathematicsvol 11 no 2 pp 431ndash441 1963
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
4 Discrete Dynamics in Nature and Society
3 Application of LMA for Mackey-GlassChaotic Time Series
In this section we will derive the LMAwhen theMLP is usedfor the Mackey-Glass chaotic time series prediction Supposethat we use the 119909(119905) minus Δ119879
0 119909(119905 minus Δ119879
1) 119909(119905 minus Δ119879
119899minus1) to
predict the future variable 119909(119905 + Δ119879) where Δ1198790= 0
To implement the LMA what we should do is calculatethe Jacobian matrix J whose 119894th row equals 119869
119894 According to
(3) and (4) function 119891(x 120579) can be expressed as
119891 (x 120579) =119898
sum
119895=1
V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895) + 1198871015840
(20)
What we should do is calculate the 119869119894= 120597119891(x
119894 120579)120597120579
The derivatives of 119891(x 120579) with respect to 120579 are
120597119891 (x 120579)120597119887119895
= V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
times [1 minus 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)]
120597119891 (x 120579)120597119908119894119895
= V119895120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
times [1 minus 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)]
times 119909 (119905 minus Δ119879119894minus1
)
120597119891 (x 120579)1205971198871015840
= 1
120597119891 (x 120579)120597V119895
= 120593(
119899
sum
119894=1
119908119894119895119909 (119905 minus Δ119879
119894minus1) + 119887119895)
119894 = 1 2 119899 119895 = 1 2 119898
(21)
As we know 120579 = (1198871 119887
119898 11990811 119908
1198991 11990812
1199081198992 119908
1119898 119908
119899119898 1198871015840
V1 V
119898) isin 119877(119899+1)119898+119898+1 so 119869
119894 the
119894th row of J can be easily obtained according to (21) J iscalculated and when MLP is used for Mackey-Glass chaotictime series prediction the LMA can also be obtained
4 Numerical Simulations
Example 1 We will conduct an experiment to show theefficiency of the Levenberg-Marquardt algorithmWe choosea chaotic time series created by the Mackey-Glass delay-difference equation
119889119909 (119905)
119889119905=
02119909 (119905 minus 120591)
1 + 11990910 (119905 minus 120591)minus 01119909 (119905) (22)
for 120591 = 17Such a series has some short-range time coherence but
long-term prediction is very difficult The need to predict
such a time series arises in detecting arrhythmias in heart-beats
The network is given no information about the generatorof the time series and is asked to predict the future of the timeseries from a few samples of the history of the time series Inour example we trained the network to predict the value attime 119879+Δ119879 from inputs at time 119879 119879minus 6 119879minus 12 and 119879minus 18and we will adopt Δ119879 = 50 here
In the simulation 3000 training examples and 500 testexamples are generated by (22) We use the following mul-tilayer perceptron for fitting the generated training examples
119909 (119905 + 50) =
20
sum
119895=1
119881119895120593(
4
sum
119894=1
119909 (119905 minus 6 (119894 minus 1)) 119908119894119895+ 119887119895) + 1198871015840
(23)
for example the number of the hidden units is 119899 = 20 and thedimension of the input is119898 = 4
Let 119891(x 120579) = sum20
119895=1119881119895120593(sum4
119894=1119909(119905 minus 6(119894 minus 1))119908
119894119895+ 119887119895) + 1198871015840
then we can obtain the following equation according to (21)
120597119891 (x 120579)120597119887119895
= V119895120593(
20
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)
times [1 minus 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)]
120597119891 (x 120579)120597119908119894119895
= V119895120593(
20
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) + 119887
119895)
times [1 minus 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)]
times 119909 (119905 minus 6 times (119894 minus 1)119908119894119895+ 119887119895)
120597119891 (x 120579)1205971198871015840
= 1
120597119891 (x 120579)120597V119895
= 120593(
4
sum
119894=1
119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908
119894119895+ 119887119895)
119894 = 1 2 20 119895 = 1 2 3 4
(24)
The initial values of the parameters are selected randomly
119882119894119895sim 119880 (2 4) 119881
119895sim 119880 (1 2)
119894 = 1 2 21 119895 = 1 2 3 4 5
(25)
The learning curves of the error function and the fittingresult of LMA and GDA are shown in Figures 3 4 5 and 6respectively
The learning curves of LMA and GNA are shown inFigures 3 and 4 respectively The training error sum
3000
119894=1(119910119894minus
119891(x119894 120579))2 of LMA can reach 01 while the final training error
of GDA is more than 90 Furthermore the final mean testerror (1500)sum5500
119894=5001(119910119894minus119891(x119894 120579))2
= 00118 of LMA ismuchsmaller than 02296 which is the final test error of GDA
Discrete Dynamics in Nature and Society 5
10 20 30 40 50 60 70 80 90 10050
100
150
200
250
300
350
400
Training number
Trai
ning
erro
r
Figure 3 The GDA learning curve of the error function
10 20 30 40 50 60 70 80 90 1000
01
02
03
04
05
06
07
08
09
1
Training number
Trai
ning
erro
r
Figure 4 The LMA learning curve of the error function
As far as the fitting effect is concerned the performanceof LMA is much better than that of the GDA This is veryobvious from Figures 5 and 6
All of these suggest that when we predict the Mackey-Glass chaotic time series the performance of LMA is verygood It can effectively overcome the difficulties which mayarise in the GDA
5 Conclusions and Discussions
In this paper we discussed the application of the Levenberg-Marquardt algorithm for the Mackey-Glass chaotic timeseries prediction We used the multilayer perceptron with 20
hidden units to approximate and predict the Mackey-Glasschaotic time series In the process of minimizing the errorfunction we adopted the Levenberg-Marquardt algorithmIf reduction of 119871(120579) is rapid a smaller value damping factor120582 can be used bringing the algorithm closer to the Gauss-Newton algorithm whereas if an iteration gives insufficient
5000 5100 5200 5300 5400 550002
04
06
08
1
12
14
16
Model outputTestSamOut
Figure 5 Fitting of Mackey-Glass chaotic time series GDA
5000 5100 5200 5300 5400 550004
05
06
07
08
09
1
11
12
13
14
Model outputTestSamOut
Figure 6 Fitting of Mackey-Glass chaotic time series LMA
reduction in the residual 120582 can be increased giving astep closer to the gradient descent direction In this paperthe learning mode is batch At last we demonstrate theperformance of the LMA Simulations show that the LMAcanachieve much better prediction efficiency than the gradientdescent method
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
6 Discrete Dynamics in Nature and Society
Acknowledgment
This project is supported by the National Natural ScienceFoundation of China under Grants 61174076 11471152 and61403178
References
[1] M C Mackey and L Glass ldquoOscillation and chaos in physio-logical control systemsrdquo Science vol 197 no 4300 pp 287ndash2891977
[2] L Glass and M C Mackey ldquoMackey-Glass equationrdquo Scholar-pedia vol 5 no 3 p 6908 2010
[3] J Hale Theory of Functional Differential Equations SpringerNew York NY USA 1977
[4] J Mallet-Paret and R D Nussbaum ldquoA differential-delayequation arising in optics and physiologyrdquo SIAM Journal onMathematical Analysis vol 20 no 2 pp 249ndash292 1989
[5] H OWalther ldquoThe 2-dimensional attractor of 119889119909119889119905 = 120583119909 (119905)+
119891 (119909 (119905 minus 1))rdquo Memoirs of the American Mathematical Societyvol 113 no 544 p 76 1995
[6] J Mallet-Paret and G R Sell ldquoThe Poincare-Bendixson theo-rem for monotone cyclic feedback systems with delayrdquo Journalof Differential Equations vol 125 no 2 pp 441ndash489 1996
[7] B Lani-Wayda and H-O Walther ldquoChaotic motion generatedby delayed negative feedback part II construction of nonlin-earitiesrdquoMathematische Nachrichten vol 180 pp 181ndash211 2000
[8] G Rost and J H Wu ldquoDomain-decomposition method for theglobal dynamics of delay differential equations with unimodalfeedbackrdquo Proceedings of The Royal Society of London Series AMathematical Physical and Engineering Sciences vol 463 no2086 pp 2655ndash2669 2007
[9] M Awad H Pomares I Rojas O Salameh and M HamdonldquoPrediction of time series using RBF neural networks a newapproach of clusteringrdquo International Arab Journal of Informa-tion Technology vol 6 no 2 pp 138ndash143 2009
[10] M Awad ldquoChaotic time series prediction using wavelet neuralnetworkrdquo Journal of Artificial Intelligence Theory and Applica-tion vol 1 no 3 pp 73ndash80 2010
[11] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 pp 23ndash33 2014
[12] A C Fowler ldquoRespiratory control and the onset of periodicbreathingrdquoMathematical Modelling of Natural Phenomena vol9 no 1 pp 39ndash57 2014
[13] N Wang M J Er and M Han ldquoGeneralized single-hiddenlayer feedforward networks for regression problemsrdquo IEEETransactions on Neural Networks and Learning Systems 2014
[14] N Wang ldquoA generalized ellipsoidal basis function based onlineself-constructing fuzzy neural networkrdquo Neural Processing Let-ters vol 34 no 1 pp 13ndash37 2011
[15] H K WeiTheory and Method of the Neural Networks Architec-ture Design National Defence Industry Press Beijing China2005
[16] H Wei and S-I Amari ldquoDynamics of learning near singulari-ties in radial basis function networksrdquoNeural Networks vol 21no 7 pp 989ndash1005 2008
[17] H Wei J Zhang F Cousseau T Ozeki and S-I AmarildquoDynamics of learning near singularities in layered networksrdquoNeural Computation vol 20 no 3 pp 813ndash843 2008
[18] K Levenberg ldquoA method for the solution of certain non-linearproblems in least squaresrdquo Quarterly of Applied Mathematicsvol 2 pp 164ndash168 1944
[19] M THagan andM BMenhaj ldquoTraining feedforward networkswith the Marquardt algorithmrdquo IEEE Transactions on NeuralNetworks vol 5 no 6 pp 989ndash993 1994
[20] W Guo H Wei J Zhao and K Zhang ldquoAveraged learn-ing equations of error-function-based multilayer perceptronsrdquoNeural Computing and Applications vol 25 no 3-4 pp 825ndash832 2014
[21] P E Gill and W Murray ldquoAlgorithms for the solution of thenonlinear least-squares problemrdquo SIAM Journal on NumericalAnalysis vol 15 no 5 pp 977ndash992 1978
[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo SIAM Journal on Applied Mathematicsvol 11 no 2 pp 431ndash441 1963
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Discrete Dynamics in Nature and Society 5
10 20 30 40 50 60 70 80 90 10050
100
150
200
250
300
350
400
Training number
Trai
ning
erro
r
Figure 3 The GDA learning curve of the error function
10 20 30 40 50 60 70 80 90 1000
01
02
03
04
05
06
07
08
09
1
Training number
Trai
ning
erro
r
Figure 4 The LMA learning curve of the error function
As far as the fitting effect is concerned the performanceof LMA is much better than that of the GDA This is veryobvious from Figures 5 and 6
All of these suggest that when we predict the Mackey-Glass chaotic time series the performance of LMA is verygood It can effectively overcome the difficulties which mayarise in the GDA
5 Conclusions and Discussions
In this paper we discussed the application of the Levenberg-Marquardt algorithm for the Mackey-Glass chaotic timeseries prediction We used the multilayer perceptron with 20
hidden units to approximate and predict the Mackey-Glasschaotic time series In the process of minimizing the errorfunction we adopted the Levenberg-Marquardt algorithmIf reduction of 119871(120579) is rapid a smaller value damping factor120582 can be used bringing the algorithm closer to the Gauss-Newton algorithm whereas if an iteration gives insufficient
5000 5100 5200 5300 5400 550002
04
06
08
1
12
14
16
Model outputTestSamOut
Figure 5 Fitting of Mackey-Glass chaotic time series GDA
5000 5100 5200 5300 5400 550004
05
06
07
08
09
1
11
12
13
14
Model outputTestSamOut
Figure 6 Fitting of Mackey-Glass chaotic time series LMA
reduction in the residual 120582 can be increased giving astep closer to the gradient descent direction In this paperthe learning mode is batch At last we demonstrate theperformance of the LMA Simulations show that the LMAcanachieve much better prediction efficiency than the gradientdescent method
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
6 Discrete Dynamics in Nature and Society
Acknowledgment
This project is supported by the National Natural ScienceFoundation of China under Grants 61174076 11471152 and61403178
References
[1] M C Mackey and L Glass ldquoOscillation and chaos in physio-logical control systemsrdquo Science vol 197 no 4300 pp 287ndash2891977
[2] L Glass and M C Mackey ldquoMackey-Glass equationrdquo Scholar-pedia vol 5 no 3 p 6908 2010
[3] J Hale Theory of Functional Differential Equations SpringerNew York NY USA 1977
[4] J Mallet-Paret and R D Nussbaum ldquoA differential-delayequation arising in optics and physiologyrdquo SIAM Journal onMathematical Analysis vol 20 no 2 pp 249ndash292 1989
[5] H OWalther ldquoThe 2-dimensional attractor of 119889119909119889119905 = 120583119909 (119905)+
119891 (119909 (119905 minus 1))rdquo Memoirs of the American Mathematical Societyvol 113 no 544 p 76 1995
[6] J Mallet-Paret and G R Sell ldquoThe Poincare-Bendixson theo-rem for monotone cyclic feedback systems with delayrdquo Journalof Differential Equations vol 125 no 2 pp 441ndash489 1996
[7] B Lani-Wayda and H-O Walther ldquoChaotic motion generatedby delayed negative feedback part II construction of nonlin-earitiesrdquoMathematische Nachrichten vol 180 pp 181ndash211 2000
[8] G Rost and J H Wu ldquoDomain-decomposition method for theglobal dynamics of delay differential equations with unimodalfeedbackrdquo Proceedings of The Royal Society of London Series AMathematical Physical and Engineering Sciences vol 463 no2086 pp 2655ndash2669 2007
[9] M Awad H Pomares I Rojas O Salameh and M HamdonldquoPrediction of time series using RBF neural networks a newapproach of clusteringrdquo International Arab Journal of Informa-tion Technology vol 6 no 2 pp 138ndash143 2009
[10] M Awad ldquoChaotic time series prediction using wavelet neuralnetworkrdquo Journal of Artificial Intelligence Theory and Applica-tion vol 1 no 3 pp 73ndash80 2010
[11] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 pp 23ndash33 2014
[12] A C Fowler ldquoRespiratory control and the onset of periodicbreathingrdquoMathematical Modelling of Natural Phenomena vol9 no 1 pp 39ndash57 2014
[13] N Wang M J Er and M Han ldquoGeneralized single-hiddenlayer feedforward networks for regression problemsrdquo IEEETransactions on Neural Networks and Learning Systems 2014
[14] N Wang ldquoA generalized ellipsoidal basis function based onlineself-constructing fuzzy neural networkrdquo Neural Processing Let-ters vol 34 no 1 pp 13ndash37 2011
[15] H K WeiTheory and Method of the Neural Networks Architec-ture Design National Defence Industry Press Beijing China2005
[16] H Wei and S-I Amari ldquoDynamics of learning near singulari-ties in radial basis function networksrdquoNeural Networks vol 21no 7 pp 989ndash1005 2008
[17] H Wei J Zhang F Cousseau T Ozeki and S-I AmarildquoDynamics of learning near singularities in layered networksrdquoNeural Computation vol 20 no 3 pp 813ndash843 2008
[18] K Levenberg ldquoA method for the solution of certain non-linearproblems in least squaresrdquo Quarterly of Applied Mathematicsvol 2 pp 164ndash168 1944
[19] M THagan andM BMenhaj ldquoTraining feedforward networkswith the Marquardt algorithmrdquo IEEE Transactions on NeuralNetworks vol 5 no 6 pp 989ndash993 1994
[20] W Guo H Wei J Zhao and K Zhang ldquoAveraged learn-ing equations of error-function-based multilayer perceptronsrdquoNeural Computing and Applications vol 25 no 3-4 pp 825ndash832 2014
[21] P E Gill and W Murray ldquoAlgorithms for the solution of thenonlinear least-squares problemrdquo SIAM Journal on NumericalAnalysis vol 15 no 5 pp 977ndash992 1978
[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo SIAM Journal on Applied Mathematicsvol 11 no 2 pp 431ndash441 1963
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
6 Discrete Dynamics in Nature and Society
Acknowledgment
This project is supported by the National Natural ScienceFoundation of China under Grants 61174076 11471152 and61403178
References
[1] M C Mackey and L Glass ldquoOscillation and chaos in physio-logical control systemsrdquo Science vol 197 no 4300 pp 287ndash2891977
[2] L Glass and M C Mackey ldquoMackey-Glass equationrdquo Scholar-pedia vol 5 no 3 p 6908 2010
[3] J Hale Theory of Functional Differential Equations SpringerNew York NY USA 1977
[4] J Mallet-Paret and R D Nussbaum ldquoA differential-delayequation arising in optics and physiologyrdquo SIAM Journal onMathematical Analysis vol 20 no 2 pp 249ndash292 1989
[5] H OWalther ldquoThe 2-dimensional attractor of 119889119909119889119905 = 120583119909 (119905)+
119891 (119909 (119905 minus 1))rdquo Memoirs of the American Mathematical Societyvol 113 no 544 p 76 1995
[6] J Mallet-Paret and G R Sell ldquoThe Poincare-Bendixson theo-rem for monotone cyclic feedback systems with delayrdquo Journalof Differential Equations vol 125 no 2 pp 441ndash489 1996
[7] B Lani-Wayda and H-O Walther ldquoChaotic motion generatedby delayed negative feedback part II construction of nonlin-earitiesrdquoMathematische Nachrichten vol 180 pp 181ndash211 2000
[8] G Rost and J H Wu ldquoDomain-decomposition method for theglobal dynamics of delay differential equations with unimodalfeedbackrdquo Proceedings of The Royal Society of London Series AMathematical Physical and Engineering Sciences vol 463 no2086 pp 2655ndash2669 2007
[9] M Awad H Pomares I Rojas O Salameh and M HamdonldquoPrediction of time series using RBF neural networks a newapproach of clusteringrdquo International Arab Journal of Informa-tion Technology vol 6 no 2 pp 138ndash143 2009
[10] M Awad ldquoChaotic time series prediction using wavelet neuralnetworkrdquo Journal of Artificial Intelligence Theory and Applica-tion vol 1 no 3 pp 73ndash80 2010
[11] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 pp 23ndash33 2014
[12] A C Fowler ldquoRespiratory control and the onset of periodicbreathingrdquoMathematical Modelling of Natural Phenomena vol9 no 1 pp 39ndash57 2014
[13] N Wang M J Er and M Han ldquoGeneralized single-hiddenlayer feedforward networks for regression problemsrdquo IEEETransactions on Neural Networks and Learning Systems 2014
[14] N Wang ldquoA generalized ellipsoidal basis function based onlineself-constructing fuzzy neural networkrdquo Neural Processing Let-ters vol 34 no 1 pp 13ndash37 2011
[15] H K WeiTheory and Method of the Neural Networks Architec-ture Design National Defence Industry Press Beijing China2005
[16] H Wei and S-I Amari ldquoDynamics of learning near singulari-ties in radial basis function networksrdquoNeural Networks vol 21no 7 pp 989ndash1005 2008
[17] H Wei J Zhang F Cousseau T Ozeki and S-I AmarildquoDynamics of learning near singularities in layered networksrdquoNeural Computation vol 20 no 3 pp 813ndash843 2008
[18] K Levenberg ldquoA method for the solution of certain non-linearproblems in least squaresrdquo Quarterly of Applied Mathematicsvol 2 pp 164ndash168 1944
[19] M THagan andM BMenhaj ldquoTraining feedforward networkswith the Marquardt algorithmrdquo IEEE Transactions on NeuralNetworks vol 5 no 6 pp 989ndash993 1994
[20] W Guo H Wei J Zhao and K Zhang ldquoAveraged learn-ing equations of error-function-based multilayer perceptronsrdquoNeural Computing and Applications vol 25 no 3-4 pp 825ndash832 2014
[21] P E Gill and W Murray ldquoAlgorithms for the solution of thenonlinear least-squares problemrdquo SIAM Journal on NumericalAnalysis vol 15 no 5 pp 977ndash992 1978
[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo SIAM Journal on Applied Mathematicsvol 11 no 2 pp 431ndash441 1963
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of