research article levenberg-marquardt algorithm for mackey...

Research ArticleLevenberg-Marquardt Algorithm for Mackey-GlassChaotic Time Series Prediction

Junsheng Zhao1 Yongmin Li23 Xingjiang Yu1 and Xingfang Zhang1

1 School of Mathematics Liaocheng University Liaocheng 252059 China2 School of Science Huzhou University Huzhou 313000 China3 School of Automation Southeast University Nanjing 210096 China

Correspondence should be addressed to Junsheng Zhao zhaojunshshao163com

Received 9 August 2014 Accepted 11 October 2014 Published 11 November 2014

Academic Editor Rongni Yang

Copyright copy 2014 Junsheng Zhao et alThis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

For decades Mackey-Glass chaotic time series prediction has attracted more and more attention When the multilayer perceptronis used to predict the Mackey-Glass chaotic time series what we should do is to minimize the loss function As is well knownthe convergence speed of the loss function is rapid in the beginning of the learning process while the convergence speed is veryslow when the parameter is near to the minimum point In order to overcome these problems we introduce the Levenberg-Marquardt algorithm (LMA) Firstly a rough introduction is given to the multilayer perceptron including the structure and themodel approximation method Secondly we introduce the LMA and discuss how to implement the LMA Lastly an illustrativeexample is carried out to show the prediction efficiency of the LMA Simulations show that the LMA can give more accurateprediction than the gradient descent method

1 Introduction

The Mackey-Glass chaotic time series is generated by thefollowing nonlinear time delay differential equation

119889119909 (119905)

119889119905=

120573119909 (119905 minus 120591)

1 + 119909119899 (119905 minus 120591)+ 120574119909 (119905) (1)

where 120573 120574 120591 and 119899 are real numbers Depending on thevalues of the parameters this equation displays a rangeof periodic and chaotic dynamics Such a series has someshort-range time coherence but long-term prediction is verydifficult

Originally Mackey and Glass proposed the followingequation to illustrate the appearance of complex dynamics inphysiological control systems by way of bifurcations in thedynamics

119889119909

119889119905=

120573119909120591

1 + 119909119899120591

minus 120574119909 120574 120573 119899 gt 0 (2)

They suggested that many physiological disorders calleddynamical diseases were characterized by changes in qual-itative features of dynamics The qualitative changes of phys-iological dynamics corresponded mathematically to bifur-cations in the dynamics of the system The bifurcationsin the equation dynamics could be induced by changes inthe parameters of the system as might arise from diseaseor environmental factors such as drugs or changes in thestructure of the system [1 2]

The Mackey-Glass equation has also had an impacton more rigorous mathematical studies of delay-differentialequations Methods for analysis of some of the propertiesof delay differential equations such as the existence ofsolutions and stability of equilibria and periodic solutionshad already been developed [3] However the existenceof chaotic dynamics in delay-differential equations wasunknown Subsequent studies of delay differential equationswith monotonic feedback have provided significant insightinto the conditions needed for oscillation and propertiesof oscillations [4ndash6] For delay differential equations withnonmonotonic feedback mathematical analysis has proven

Hindawi Publishing CorporationDiscrete Dynamics in Nature and SocietyVolume 2014 Article ID 193758 6 pageshttpdxdoiorg1011552014193758

2 Discrete Dynamics in Nature and Society

W

b1

V

bm

b998400

y

isin Rnx

Figure 1 Multilayer Perceptrons

much more difficult However rigorous proofs for chaoticdynamics have been obtained for the differential delayequation 119889119909119889119905 = 119892(119909(119905 minus 1)) for special classes of thefeedback function 119892 [7] Further although a proof of chaoticdynamics in the Mackey-Glass equation has still not beenfound advances in understanding the properties of delaydifferential equations is going on such as (2) that containboth exponential decay and nonmonotonic delayed feedback[8] The study of this equation remains a topic of vigorousresearch

The Mackey-Glass chaotic time series prediction is avery difficult task The aim is to predict the future state119909(119905 + Δ119879) using the current and the past time series 119909(119905)119909(119905 minus 1) 119909(119905 minus 119899) (Figure 2) Until now there are manyliteratures about the Mackey-Glass chaotic time series pre-diction [9ndash14] However as far as the prediction accuracy isconcerned most of the results in the literature are not ideal

In this paper we will predict the Mackey-Glass chaotictime series by the MLP While minimizing the loss functionwe introduce the LMA which can adjust the convergencespeed and obtain good convergence efficiency

The rest of the paper is organized as follows In Section 2we describe the multilayer perceptron Section 3 introducesthe LMA and discusses how to implement the LMA InSection 4 we give a numerical example to demonstratethe prediction efficiency Section 5 is the conclusions anddiscussions of the paper

2 Preliminaries

21 Multilayer Perceptrons A multilayer perceptron (MLP)is a feedforward artificial neural network model that mapssets of input data onto a set of appropriate outputs A MLPconsists of multiple layers of nodes in a directed graphwith each layer fully connected to the next one Exceptfor the input nodes each node is a neuron (or processingelement) with a nonlinear activation functionThemultilayerperceptron with only one hidden layer is depicted as inFigure 1 [15]

In Figure 1 x = [1199091 1199092 119909

119899]119879

isin 119877119899 is the model

input 119910 is the model output 119882 = 119908119894119895 119894 = 1 2 119899 119895 =

1 2 119898 is the connection weight from the 119909119894to the 119895th

hidden unit 119881 = V119895 is the connection weight from the 119895th

0 100 200 300 400 50004

05

06

07

08

09

1

11

12

13

14

Figure 2 Mackey-Glass chaotic time series

hidden unit to the output unit 119887119895 119895 = 1 2 119898 and 119887

1015840 arethe bias

The output of the multilayer perceptron described inFigure 1 is

119910 =

119898

sum

119895=1

V119895ℎ119895+ 1198871015840

(3)

and the outputs of the hidden units are

ℎ119895= 120593(

119899

sum

119894=1

119908119894119895119909119894+ 119887119895) (4)

respectively where 120593(sdot) is the activation function We willadopt the sigmoid function 120593(119909) as the activation functionfor example

120593 (119909) =1

1 + 119890minus119909 (5)

and the derivative of the activation function with respect to 119909

is

1205931015840

(119909) =119890minus119909

(1 + 119890minus119909)2 (6)

or

1205931015840

(119909) = 120593 (119909) (1 minus 120593 (119909)) (7)

MLP provides a universal method for function approxi-mation and classification [16 17] In the case of the functionapproximation we have a number of observed data (x

1 1199101)

(x2 1199102) (x

119871 119910119871) which are supposed to be generated by

119910 = 1198910(x) + 120585 (8)

where 120585 is noise usually subject to Gaussian distributionwith zero mean and 119891

0(x) is the unknown true generating

function

Discrete Dynamics in Nature and Society 3

Given a set of observed data sometimes called trainingexamples we search for the parameters

120579 = (1198871 119887

119898 11990811 119908

1198991 11990812 119908

1198992 119908

1119898

119908119899119898

1198871015840

V1 V

119898) isin 119877(119899+1)119898+119898+1

(9)

to approximate the teacher function 1198910(x) best where 119879

denotes the matrix transposition A satisfactory model isoften obtained by minimizing the mean square error

One of the serious problems in minimizing the meansquare error is that the convergence speed of the loss functionis rapid in the beginning of the learning process whilethe convergence speed is very slow in the region of theminimum [18] In order to overcome these problems we willintroduce the Levenberg-Marquardt algorithm (LMA) in thenext section

22 The Levenberg-Marquardt Algorithm In mathematicsand computing the Levenberg-Marquardt algorithm (LMA)[18ndash20] also known as the damped least-squares (DLS)method is used to solve nonlinear least squares problemsTheseminimization problems arise especially in least squarescurve fitting

The LMA is interpolates between the Gauss-Newtonalgorithm (GNA) and the gradient descent algorithm (GDA)As far as the robustness is concerned the LMA performsbetter than the GNA which means that in many cases it findsa solution even if it starts very far away from the minimumHowever for well-behaved functions and reasonable startingparameters the LMA tends to be a bit slower than the GNA

In many real applications for solving model fitting prob-lems we often adopt the LMA However like many otherfitting algorithms the LMA finds only a local minimumwhich is always not the global minimum

The least squares curve fitting problem is described asfollows Instead of the unknown true model a set of 119873 pairsof independent variables (x

1 1199101) (x2 1199102) (x

119873 119910119873) are

given Suppose that 119891(119909 120579) is the approximation model and119871(120579) is a loss function which is the sum of the squares of thedeviations

119871 (120579) =

119873

sum

119894=1

[119910119894minus 119891 (x

119894 120579)]2

(10)

The task of curve fitting problem isminimizing the above lossfunction 119871(120579) [21]

The LMA is an iterative algorithm and the parameter120579 is adjusted in each iteration step Generally speaking wechoose an initial parameter randomly for example 120579

119894sim

119880(minus1 1) 119894 = 1 2 119899 where 119899 is the dimension ofparameter 120579

The Taylor expansion of the function 119891(x119894 120579 + Δ120579) is

119891 (x119894 120579 + Δ120579) asymp 119891 (x

119894 120579) +

120597119891 (x119894 120579)

120597120579Δ120579 (11)

As we know at the minimum 120579lowast of loss function 119871(120579) the

gradient of 119871(120579) with respect to 120579 will be zero Substituting(11) into (10) we can obtain

119871 (120579 + Δ120579) asymp

119873

sum

119894=1

(119910119894minus 119891 (x

119894 120579) minus J

119894Δ120579)2

(12)

where 119869119894= 120597119891(x

119894 120579)120597120579 or

119871 (120579 + Δ120579) asymp1003817100381710038171003817y minus f(120579) minus JΔ1205791003817100381710038171003817

2

(13)

Taking the derivative with respect toΔ120579 and setting the resultto zero give

(J119879J) Δ120579 = J119879 (y minus f (120579)) (14)

where J is the Jacobian matrix whose 119894th row equals 119869119894and

also y and f are vectors with 119894th component 119891(119909119894 120579) and 119910

119894

respectively This is a set of linear equations which can besolved for Δ120579

Levenbergrsquos contribution is to replace this equation by aldquodamped versionrdquo

(J119879J + 120582119868) Δ120579 = J119879 [y minus f (120579)] (15)

where 119868 is the identity matrix giving the increment Δ120579 to theestimated parameter vector 120579

The damping factor 120582 is adjusted at each iteration step Ifthe loss function 119871(120579) reduces rapidly 120582 will adopt a smallvalue and then the LMA is similar to the Gauss-Newtonalgorithm While the loss function 119871(120579) reduces very slowly120582 can be increased giving a step closer to the gradient descentdirection and

120597119871 (120579)

120597Δ120579= minus2J119879 [y minus f (120579)]119879 (16)

Therefore for large values of 120582 the step will be takenapproximately in the direction of the gradient

In the process of iteration if either the length of thecalculated step Δ120579 or the reduction of 119871(120579) from the latestparameter vector 120579 + Δ120579 falls below the predefined limitsiteration process stops and then we take the last parametervector 120579 as the final solution

Levenbergrsquos algorithm has the disadvantage that if thevalue of damping factor 120582 is large the inverse of J119879J + 120582119868

does not work at all Marquardt provided the insight thatwe can scale each component of the gradient according tothe curvature so that there is larger movement along thedirections where the gradient is smaller This avoids slowconvergence in the direction of small gradient ThereforeMarquardt replaced the identity matrix 119868 with the diagonalmatrix consisting of the diagonal elements of J119879J resulting inthe Levenberg-Marquardt algorithm [22]

[J119879J + 120582 diag (J119879J)] Δ120579 = J119879 [y minus f (120579)] (17)

and the LMA is as follows

120579119905+1

= 120579119905+ Δ120579 (18)

where

Δ120579 = [J119879J + 120582 diag (J119879J) ]minus1

J119879 [y minus f (120579)] (19)


3 Application of LMA for Mackey-GlassChaotic Time Series

In this section we will derive the LMAwhen theMLP is usedfor the Mackey-Glass chaotic time series prediction Supposethat we use the 119909(119905) minus Δ119879

0 119909(119905 minus Δ119879

1) 119909(119905 minus Δ119879

119899minus1) to

predict the future variable 119909(119905 + Δ119879) where Δ1198790= 0

To implement the LMA what we should do is calculatethe Jacobian matrix J whose 119894th row equals 119869

119894 According to

(3) and (4) function 119891(x 120579) can be expressed as

119891 (x 120579) =119898

sum

119895=1

V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895) + 1198871015840

(20)

What we should do is calculate the 119869119894= 120597119891(x

119894 120579)120597120579

The derivatives of 119891(x 120579) with respect to 120579 are

120597119891 (x 120579)120597119887119895

= V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)

times [1 minus 120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)]

120597119891 (x 120579)120597119908119894119895

= V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)


119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)]

times 119909 (119905 minus Δ119879119894minus1

)

120597119891 (x 120579)1205971198871015840

= 1

120597119891 (x 120579)120597V119895

= 120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)

119894 = 1 2 119899 119895 = 1 2 119898

(21)

As we know 120579 = (1198871 119887

119898 11990811 119908

1198991 11990812

1199081198992 119908

1119898 119908

119899119898 1198871015840

V1 V

119898) isin 119877(119899+1)119898+119898+1 so 119869

119894 the

119894th row of J can be easily obtained according to (21) J iscalculated and when MLP is used for Mackey-Glass chaotictime series prediction the LMA can also be obtained

4 Numerical Simulations

Example 1 We will conduct an experiment to show theefficiency of the Levenberg-Marquardt algorithmWe choosea chaotic time series created by the Mackey-Glass delay-difference equation

119889119909 (119905)

119889119905=

02119909 (119905 minus 120591)

1 + 11990910 (119905 minus 120591)minus 01119909 (119905) (22)

for 120591 = 17Such a series has some short-range time coherence but

long-term prediction is very difficult The need to predict

such a time series arises in detecting arrhythmias in heart-beats

The network is given no information about the generatorof the time series and is asked to predict the future of the timeseries from a few samples of the history of the time series Inour example we trained the network to predict the value attime 119879+Δ119879 from inputs at time 119879 119879minus 6 119879minus 12 and 119879minus 18and we will adopt Δ119879 = 50 here

In the simulation 3000 training examples and 500 testexamples are generated by (22) We use the following mul-tilayer perceptron for fitting the generated training examples

119909 (119905 + 50) =

20

sum

119895=1

119881119895120593(

4

sum

119894=1

119909 (119905 minus 6 (119894 minus 1)) 119908119894119895+ 119887119895) + 1198871015840

(23)

for example the number of the hidden units is 119899 = 20 and thedimension of the input is119898 = 4

Let 119891(x 120579) = sum20

119895=1119881119895120593(sum4

119894=1119909(119905 minus 6(119894 minus 1))119908

119894119895+ 119887119895) + 1198871015840

then we can obtain the following equation according to (21)

120597119891 (x 120579)120597119887119895

= V119895120593(

20

sum

119894=1

119908119894119895119909 (119905 minus 6 times (119894 minus 1)) 119908

119894119895+ 119887119895)


4

sum

119894=1


119894119895+ 119887119895)]

120597119891 (x 120579)120597119908119894119895

= V119895120593(

20

sum

119894=1

119908119894119895119909 (119905 minus 6 times (119894 minus 1)) + 119887

119895)


4

sum

119894=1


119894119895+ 119887119895)]

times 119909 (119905 minus 6 times (119894 minus 1)119908119894119895+ 119887119895)

120597119891 (x 120579)1205971198871015840

= 1

120597119891 (x 120579)120597V119895

= 120593(

4

sum

119894=1


119894119895+ 119887119895)

119894 = 1 2 20 119895 = 1 2 3 4

(24)

The initial values of the parameters are selected randomly

119882119894119895sim 119880 (2 4) 119881

119895sim 119880 (1 2)

119894 = 1 2 21 119895 = 1 2 3 4 5

(25)

The learning curves of the error function and the fittingresult of LMA and GDA are shown in Figures 3 4 5 and 6respectively

The learning curves of LMA and GNA are shown inFigures 3 and 4 respectively The training error sum

3000

119894=1(119910119894minus

119891(x119894 120579))2 of LMA can reach 01 while the final training error

of GDA is more than 90 Furthermore the final mean testerror (1500)sum5500

119894=5001(119910119894minus119891(x119894 120579))2

= 00118 of LMA ismuchsmaller than 02296 which is the final test error of GDA


10 20 30 40 50 60 70 80 90 10050

100

150

200

250

300

350

400

Training number

Trai

ning

erro

r

Figure 3 The GDA learning curve of the error function

10 20 30 40 50 60 70 80 90 1000

01

02

03

04

05

06

07

08

09

1

Training number

Trai

ning

erro

r

Figure 4 The LMA learning curve of the error function

As far as the fitting effect is concerned the performanceof LMA is much better than that of the GDA This is veryobvious from Figures 5 and 6

All of these suggest that when we predict the Mackey-Glass chaotic time series the performance of LMA is verygood It can effectively overcome the difficulties which mayarise in the GDA

5 Conclusions and Discussions

In this paper we discussed the application of the Levenberg-Marquardt algorithm for the Mackey-Glass chaotic timeseries prediction We used the multilayer perceptron with 20

hidden units to approximate and predict the Mackey-Glasschaotic time series In the process of minimizing the errorfunction we adopted the Levenberg-Marquardt algorithmIf reduction of 119871(120579) is rapid a smaller value damping factor120582 can be used bringing the algorithm closer to the Gauss-Newton algorithm whereas if an iteration gives insufficient

5000 5100 5200 5300 5400 550002

04

06

08

1

12

14

16

Model outputTestSamOut

Figure 5 Fitting of Mackey-Glass chaotic time series GDA

5000 5100 5200 5300 5400 550004

05

06

07

08

09

1

11

12

13

14


Figure 6 Fitting of Mackey-Glass chaotic time series LMA

reduction in the residual 120582 can be increased giving astep closer to the gradient descent direction In this paperthe learning mode is batch At last we demonstrate theperformance of the LMA Simulations show that the LMAcanachieve much better prediction efficiency than the gradientdescent method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper


Acknowledgment

This project is supported by the National Natural ScienceFoundation of China under Grants 61174076 11471152 and61403178

References

[1] M C Mackey and L Glass ldquoOscillation and chaos in physio-logical control systemsrdquo Science vol 197 no 4300 pp 287ndash2891977

[2] L Glass and M C Mackey ldquoMackey-Glass equationrdquo Scholar-pedia vol 5 no 3 p 6908 2010

[3] J Hale Theory of Functional Differential Equations SpringerNew York NY USA 1977

[4] J Mallet-Paret and R D Nussbaum ldquoA differential-delayequation arising in optics and physiologyrdquo SIAM Journal onMathematical Analysis vol 20 no 2 pp 249ndash292 1989

[5] H OWalther ldquoThe 2-dimensional attractor of 119889119909119889119905 = 120583119909 (119905)+

119891 (119909 (119905 minus 1))rdquo Memoirs of the American Mathematical Societyvol 113 no 544 p 76 1995

[6] J Mallet-Paret and G R Sell ldquoThe Poincare-Bendixson theo-rem for monotone cyclic feedback systems with delayrdquo Journalof Differential Equations vol 125 no 2 pp 441ndash489 1996

[7] B Lani-Wayda and H-O Walther ldquoChaotic motion generatedby delayed negative feedback part II construction of nonlin-earitiesrdquoMathematische Nachrichten vol 180 pp 181ndash211 2000

[8] G Rost and J H Wu ldquoDomain-decomposition method for theglobal dynamics of delay differential equations with unimodalfeedbackrdquo Proceedings of The Royal Society of London Series AMathematical Physical and Engineering Sciences vol 463 no2086 pp 2655ndash2669 2007

[9] M Awad H Pomares I Rojas O Salameh and M HamdonldquoPrediction of time series using RBF neural networks a newapproach of clusteringrdquo International Arab Journal of Informa-tion Technology vol 6 no 2 pp 138ndash143 2009

[10] M Awad ldquoChaotic time series prediction using wavelet neuralnetworkrdquo Journal of Artificial Intelligence Theory and Applica-tion vol 1 no 3 pp 73ndash80 2010

[11] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 pp 23ndash33 2014

[12] A C Fowler ldquoRespiratory control and the onset of periodicbreathingrdquoMathematical Modelling of Natural Phenomena vol9 no 1 pp 39ndash57 2014

[13] N Wang M J Er and M Han ldquoGeneralized single-hiddenlayer feedforward networks for regression problemsrdquo IEEETransactions on Neural Networks and Learning Systems 2014

[14] N Wang ldquoA generalized ellipsoidal basis function based onlineself-constructing fuzzy neural networkrdquo Neural Processing Let-ters vol 34 no 1 pp 13ndash37 2011

[15] H K WeiTheory and Method of the Neural Networks Architec-ture Design National Defence Industry Press Beijing China2005

[16] H Wei and S-I Amari ldquoDynamics of learning near singulari-ties in radial basis function networksrdquoNeural Networks vol 21no 7 pp 989ndash1005 2008

[17] H Wei J Zhang F Cousseau T Ozeki and S-I AmarildquoDynamics of learning near singularities in layered networksrdquoNeural Computation vol 20 no 3 pp 813ndash843 2008

[18] K Levenberg ldquoA method for the solution of certain non-linearproblems in least squaresrdquo Quarterly of Applied Mathematicsvol 2 pp 164ndash168 1944

[19] M THagan andM BMenhaj ldquoTraining feedforward networkswith the Marquardt algorithmrdquo IEEE Transactions on NeuralNetworks vol 5 no 6 pp 989ndash993 1994

[20] W Guo H Wei J Zhao and K Zhang ldquoAveraged learn-ing equations of error-function-based multilayer perceptronsrdquoNeural Computing and Applications vol 25 no 3-4 pp 825ndash832 2014

[21] P E Gill and W Murray ldquoAlgorithms for the solution of thenonlinear least-squares problemrdquo SIAM Journal on NumericalAnalysis vol 15 no 5 pp 977ndash992 1978

[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo SIAM Journal on Applied Mathematicsvol 11 no 2 pp 431ndash441 1963

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of


Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of


Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of


CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of


Operations ResearchAdvances in

Journal of


Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Algebra

Discrete Dynamics in Nature and Society



Decision SciencesAdvances in

Discrete MathematicsJournal of


Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of


W

b1

V

bm

b998400

y

isin Rnx

Figure 1 Multilayer Perceptrons

much more difficult However rigorous proofs for chaoticdynamics have been obtained for the differential delayequation 119889119909119889119905 = 119892(119909(119905 minus 1)) for special classes of thefeedback function 119892 [7] Further although a proof of chaoticdynamics in the Mackey-Glass equation has still not beenfound advances in understanding the properties of delaydifferential equations is going on such as (2) that containboth exponential decay and nonmonotonic delayed feedback[8] The study of this equation remains a topic of vigorousresearch

The Mackey-Glass chaotic time series prediction is avery difficult task The aim is to predict the future state119909(119905 + Δ119879) using the current and the past time series 119909(119905)119909(119905 minus 1) 119909(119905 minus 119899) (Figure 2) Until now there are manyliteratures about the Mackey-Glass chaotic time series pre-diction [9ndash14] However as far as the prediction accuracy isconcerned most of the results in the literature are not ideal

In this paper we will predict the Mackey-Glass chaotictime series by the MLP While minimizing the loss functionwe introduce the LMA which can adjust the convergencespeed and obtain good convergence efficiency

The rest of the paper is organized as follows In Section 2we describe the multilayer perceptron Section 3 introducesthe LMA and discusses how to implement the LMA InSection 4 we give a numerical example to demonstratethe prediction efficiency Section 5 is the conclusions anddiscussions of the paper

2 Preliminaries

21 Multilayer Perceptrons A multilayer perceptron (MLP)is a feedforward artificial neural network model that mapssets of input data onto a set of appropriate outputs A MLPconsists of multiple layers of nodes in a directed graphwith each layer fully connected to the next one Exceptfor the input nodes each node is a neuron (or processingelement) with a nonlinear activation functionThemultilayerperceptron with only one hidden layer is depicted as inFigure 1 [15]

In Figure 1 x = [1199091 1199092 119909

119899]119879

isin 119877119899 is the model

input 119910 is the model output 119882 = 119908119894119895 119894 = 1 2 119899 119895 =

1 2 119898 is the connection weight from the 119909119894to the 119895th

hidden unit 119881 = V119895 is the connection weight from the 119895th

0 100 200 300 400 50004

05

06

07

08

09

1

11

12

13

14

Figure 2 Mackey-Glass chaotic time series

hidden unit to the output unit 119887119895 119895 = 1 2 119898 and 119887

1015840 arethe bias

The output of the multilayer perceptron described inFigure 1 is

119910 =

119898

sum

119895=1

V119895ℎ119895+ 1198871015840

(3)

and the outputs of the hidden units are

ℎ119895= 120593(

119899

sum

119894=1

119908119894119895119909119894+ 119887119895) (4)

respectively where 120593(sdot) is the activation function We willadopt the sigmoid function 120593(119909) as the activation functionfor example

120593 (119909) =1

1 + 119890minus119909 (5)

and the derivative of the activation function with respect to 119909

is

1205931015840

(119909) =119890minus119909

(1 + 119890minus119909)2 (6)

or

1205931015840

(119909) = 120593 (119909) (1 minus 120593 (119909)) (7)

MLP provides a universal method for function approxi-mation and classification [16 17] In the case of the functionapproximation we have a number of observed data (x

1 1199101)

(x2 1199102) (x

119871 119910119871) which are supposed to be generated by

119910 = 1198910(x) + 120585 (8)

where 120585 is noise usually subject to Gaussian distributionwith zero mean and 119891

0(x) is the unknown true generating

function



120579 = (1198871 119887

119898 11990811 119908

1198991 11990812 119908

1198992 119908

1119898

119908119899119898

1198871015840

V1 V

119898) isin 119877(119899+1)119898+119898+1

(9)








1 1199101) (x2 1199102) (x

119873 119910119873) are


119871 (120579) =

119873

sum

119894=1

[119910119894minus 119891 (x

119894 120579)]2

(10)



119894sim



119891 (x119894 120579 + Δ120579) asymp 119891 (x

119894 120579) +

120597119891 (x119894 120579)

120597120579Δ120579 (11)



119871 (120579 + Δ120579) asymp

119873

sum

119894=1

(119910119894minus 119891 (x

119894 120579) minus J

119894Δ120579)2

(12)

where 119869119894= 120597119891(x

119894 120579)120597120579 or


2

(13)


(J119879J) Δ120579 = J119879 (y minus f (120579)) (14)



119894



(J119879J + 120582119868) Δ120579 = J119879 [y minus f (120579)] (15)



120597119871 (120579)

120597Δ120579= minus2J119879 [y minus f (120579)]119879 (16)







120579119905+1

= 120579119905+ Δ120579 (18)

where

Δ120579 = [J119879J + 120582 diag (J119879J) ]minus1

J119879 [y minus f (120579)] (19)




0 119909(119905 minus Δ119879

1) 119909(119905 minus Δ119879

119899minus1) to



119894 According to


119891 (x 120579) =119898

sum

119895=1

V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895) + 1198871015840

(20)


119894 120579)120597120579


120597119891 (x 120579)120597119887119895

= V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)


119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)]

120597119891 (x 120579)120597119908119894119895

= V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)


119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)]


)

120597119891 (x 120579)1205971198871015840

= 1

120597119891 (x 120579)120597V119895

= 120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)

119894 = 1 2 119899 119895 = 1 2 119898

(21)

As we know 120579 = (1198871 119887

119898 11990811 119908

1198991 11990812

1199081198992 119908

1119898 119908

119899119898 1198871015840

V1 V

119898) isin 119877(119899+1)119898+119898+1 so 119869

119894 the




119889119909 (119905)

119889119905=

02119909 (119905 minus 120591)

1 + 11990910 (119905 minus 120591)minus 01119909 (119905) (22)






119909 (119905 + 50) =

20

sum

119895=1

119881119895120593(

4

sum

119894=1

119909 (119905 minus 6 (119894 minus 1)) 119908119894119895+ 119887119895) + 1198871015840

(23)


Let 119891(x 120579) = sum20

119895=1119881119895120593(sum4

119894=1119909(119905 minus 6(119894 minus 1))119908

119894119895+ 119887119895) + 1198871015840


120597119891 (x 120579)120597119887119895

= V119895120593(

20

sum

119894=1


119894119895+ 119887119895)


4

sum

119894=1


119894119895+ 119887119895)]

120597119891 (x 120579)120597119908119894119895

= V119895120593(

20

sum

119894=1


119895)


4

sum

119894=1


119894119895+ 119887119895)]


120597119891 (x 120579)1205971198871015840

= 1

120597119891 (x 120579)120597V119895

= 120593(

4

sum

119894=1


119894119895+ 119887119895)

119894 = 1 2 20 119895 = 1 2 3 4

(24)


119882119894119895sim 119880 (2 4) 119881

119895sim 119880 (1 2)

119894 = 1 2 21 119895 = 1 2 3 4 5

(25)



3000

119894=1(119910119894minus



119894=5001(119910119894minus119891(x119894 120579))2



10 20 30 40 50 60 70 80 90 10050

100

150

200

250

300

350

400

Training number

Trai

ning

erro

r


10 20 30 40 50 60 70 80 90 1000

01

02

03

04

05

06

07

08

09

1

Training number

Trai

ning

erro

r







5000 5100 5200 5300 5400 550002

04

06

08

1

12

14

16



5000 5100 5200 5300 5400 550004

05

06

07

08

09

1

11

12

13

14







Acknowledgment


References































Volume 2014




Journal of











Journal of


Function Spaces






Algebra











120579 = (1198871 119887

119898 11990811 119908

1198991 11990812 119908

1198992 119908

1119898

119908119899119898

1198871015840

V1 V

119898) isin 119877(119899+1)119898+119898+1

(9)








1 1199101) (x2 1199102) (x

119873 119910119873) are


119871 (120579) =

119873

sum

119894=1

[119910119894minus 119891 (x

119894 120579)]2

(10)



119894sim



119891 (x119894 120579 + Δ120579) asymp 119891 (x

119894 120579) +

120597119891 (x119894 120579)

120597120579Δ120579 (11)



119871 (120579 + Δ120579) asymp

119873

sum

119894=1

(119910119894minus 119891 (x

119894 120579) minus J

119894Δ120579)2

(12)

where 119869119894= 120597119891(x

119894 120579)120597120579 or


2

(13)


(J119879J) Δ120579 = J119879 (y minus f (120579)) (14)



119894



(J119879J + 120582119868) Δ120579 = J119879 [y minus f (120579)] (15)



120597119871 (120579)

120597Δ120579= minus2J119879 [y minus f (120579)]119879 (16)







120579119905+1

= 120579119905+ Δ120579 (18)

where

Δ120579 = [J119879J + 120582 diag (J119879J) ]minus1

J119879 [y minus f (120579)] (19)




0 119909(119905 minus Δ119879

1) 119909(119905 minus Δ119879

119899minus1) to



119894 According to


119891 (x 120579) =119898

sum

119895=1

V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895) + 1198871015840

(20)


119894 120579)120597120579


120597119891 (x 120579)120597119887119895

= V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)


119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)]

120597119891 (x 120579)120597119908119894119895

= V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)


119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)]


)

120597119891 (x 120579)1205971198871015840

= 1

120597119891 (x 120579)120597V119895

= 120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)

119894 = 1 2 119899 119895 = 1 2 119898

(21)

As we know 120579 = (1198871 119887

119898 11990811 119908

1198991 11990812

1199081198992 119908

1119898 119908

119899119898 1198871015840

V1 V

119898) isin 119877(119899+1)119898+119898+1 so 119869

119894 the




119889119909 (119905)

119889119905=

02119909 (119905 minus 120591)

1 + 11990910 (119905 minus 120591)minus 01119909 (119905) (22)






119909 (119905 + 50) =

20

sum

119895=1

119881119895120593(

4

sum

119894=1

119909 (119905 minus 6 (119894 minus 1)) 119908119894119895+ 119887119895) + 1198871015840

(23)


Let 119891(x 120579) = sum20

119895=1119881119895120593(sum4

119894=1119909(119905 minus 6(119894 minus 1))119908

119894119895+ 119887119895) + 1198871015840


120597119891 (x 120579)120597119887119895

= V119895120593(

20

sum

119894=1


119894119895+ 119887119895)


4

sum

119894=1


119894119895+ 119887119895)]

120597119891 (x 120579)120597119908119894119895

= V119895120593(

20

sum

119894=1


119895)


4

sum

119894=1


119894119895+ 119887119895)]


120597119891 (x 120579)1205971198871015840

= 1

120597119891 (x 120579)120597V119895

= 120593(

4

sum

119894=1


119894119895+ 119887119895)

119894 = 1 2 20 119895 = 1 2 3 4

(24)


119882119894119895sim 119880 (2 4) 119881

119895sim 119880 (1 2)

119894 = 1 2 21 119895 = 1 2 3 4 5

(25)



3000

119894=1(119910119894minus



119894=5001(119910119894minus119891(x119894 120579))2



10 20 30 40 50 60 70 80 90 10050

100

150

200

250

300

350

400

Training number

Trai

ning

erro

r


10 20 30 40 50 60 70 80 90 1000

01

02

03

04

05

06

07

08

09

1

Training number

Trai

ning

erro

r







5000 5100 5200 5300 5400 550002

04

06

08

1

12

14

16



5000 5100 5200 5300 5400 550004

05

06

07

08

09

1

11

12

13

14







Acknowledgment


References































Volume 2014




Journal of











Journal of


Function Spaces






Algebra












0 119909(119905 minus Δ119879

1) 119909(119905 minus Δ119879

119899minus1) to



119894 According to


119891 (x 120579) =119898

sum

119895=1

V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895) + 1198871015840

(20)


119894 120579)120597120579


120597119891 (x 120579)120597119887119895

= V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)


119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)]

120597119891 (x 120579)120597119908119894119895

= V119895120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)


119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)]


)

120597119891 (x 120579)1205971198871015840

= 1

120597119891 (x 120579)120597V119895

= 120593(

119899

sum

119894=1

119908119894119895119909 (119905 minus Δ119879

119894minus1) + 119887119895)

119894 = 1 2 119899 119895 = 1 2 119898

(21)

As we know 120579 = (1198871 119887

119898 11990811 119908

1198991 11990812

1199081198992 119908

1119898 119908

119899119898 1198871015840

V1 V

119898) isin 119877(119899+1)119898+119898+1 so 119869

119894 the




119889119909 (119905)

119889119905=

02119909 (119905 minus 120591)

1 + 11990910 (119905 minus 120591)minus 01119909 (119905) (22)






119909 (119905 + 50) =

20

sum

119895=1

119881119895120593(

4

sum

119894=1

119909 (119905 minus 6 (119894 minus 1)) 119908119894119895+ 119887119895) + 1198871015840

(23)


Let 119891(x 120579) = sum20

119895=1119881119895120593(sum4

119894=1119909(119905 minus 6(119894 minus 1))119908

119894119895+ 119887119895) + 1198871015840


120597119891 (x 120579)120597119887119895

= V119895120593(

20

sum

119894=1


119894119895+ 119887119895)


4

sum

119894=1


119894119895+ 119887119895)]

120597119891 (x 120579)120597119908119894119895

= V119895120593(

20

sum

119894=1


119895)


4

sum

119894=1


119894119895+ 119887119895)]


120597119891 (x 120579)1205971198871015840

= 1

120597119891 (x 120579)120597V119895

= 120593(

4

sum

119894=1


119894119895+ 119887119895)

119894 = 1 2 20 119895 = 1 2 3 4

(24)


119882119894119895sim 119880 (2 4) 119881

119895sim 119880 (1 2)

119894 = 1 2 21 119895 = 1 2 3 4 5

(25)



3000

119894=1(119910119894minus



119894=5001(119910119894minus119891(x119894 120579))2



10 20 30 40 50 60 70 80 90 10050

100

150

200

250

300

350

400

Training number

Trai

ning

erro

r


10 20 30 40 50 60 70 80 90 1000

01

02

03

04

05

06

07

08

09

1

Training number

Trai

ning

erro

r







5000 5100 5200 5300 5400 550002

04

06

08

1

12

14

16



5000 5100 5200 5300 5400 550004

05

06

07

08

09

1

11

12

13

14







Acknowledgment


References































Volume 2014




Journal of











Journal of


Function Spaces






Algebra










10 20 30 40 50 60 70 80 90 10050

100

150

200

250

300

350

400

Training number

Trai

ning

erro

r


10 20 30 40 50 60 70 80 90 1000

01

02

03

04

05

06

07

08

09

1

Training number

Trai

ning

erro

r







5000 5100 5200 5300 5400 550002

04

06

08

1

12

14

16



5000 5100 5200 5300 5400 550004

05

06

07

08

09

1

11

12

13

14







Acknowledgment


References































Volume 2014




Journal of











Journal of


Function Spaces






Algebra










Acknowledgment


References































Volume 2014




Journal of











Journal of


Function Spaces






Algebra
















Volume 2014




Journal of











Journal of


Function Spaces






Algebra









research article levenberg-marquardt algorithm for mackey...

Documents