identification of nonlinear space weather models of the van ...ϕ1 jt−1,···...

1
Identification of nonlinear space weather models of the Van Allen radiation belts using Volterra networks M. Taylor 1* , I.A. Daglis 1 , A. Anastasiadis 1 and D. Vassiliadis 2 1 Institute for Space Applications and Remote Sensing(ISARS), National Observatory of Athens(NOA), Metaxa and Vasillis Pavlou Street, Penteli, Athens 15236, Greece. 2 Department of Physics, Hodges Hall, PO Box 6315, West Virginia University, Morgantown, WV 26506-6315, USA. email * : [email protected] Introduction Many efforts have been made to develop general dynamical models of the Van Allen radiation belts based on data alone. Early linear prediction filter studies focused on the response of daily-averaged relativistic electrons at geostationary altitudes[5, 1]. Vassiliadis et al (2005)[11] extended this technique spatially by incorporating SAMPEX electron flux data into linear prediction filters for a broad range of L-shells from 1.1 to 10.0 RE. Nonlinear state space models[6] have provided useful initial results on the timescales involved in modelling the impulse-reponse of the radiation belts. Here, we show how NARMAX models, in conjunction with nonlinear time-delay FIR neural networks (Volterra networks) hold great promise for the development of accurate and fully data-derived space weather specification and forecast tools. Theory FIG. 1. The overall methodology we have adopted is based on using NARMAX models to then contruct equivalent nonlinear Volterra neural networks that benefit from Takens’ Theorem for time-delay embedding combined with the ability of multilayer perceptrons to be universal function approximators. STEP 1: Constructing a taxonomy of nonlinear input- output models We began with a generalisation of the Wold time series decompo- sition [14] having the form, J t = δ + f (Ψ t )+ ε t ˆ J t + ε t (1) where J t is the electron flux time series, ˆ J t = δ + f (Ψ t ) are the model predictions, f is a general function (linear or nonlinear), δ is a constant (zero in the absence of trend), ε t = J t - ˆ J t are the prediction errors and Ψ t is an “information matrix” constructed from lag polynomials Φ p q , N r , lag operators L i , L j , L k and differencing indices d 1 , d 2 and which has the general form [10], Ψ t p (1 - L i ) d 1 J t q ε t + N r (1 - L k ) d 2 I t , (2) corresponding to a Nonlinear AutoRegressive Integrated Moving- Average eXogenous input NARIMAX(p, d 1 ,q,d 2 ,r ) process. Note that in the case of multiple inputs, I t will be a vector I t . The in- formation matrix then contains operators acting on time-delayed (lagged) time series of electron flux J t-p (autoregression AR), lagged equation errors ε t-q (moving-average MA), and lagged in- puts I t-r (eXogenous). The particular class of model chosen de- pends on how exactly Ψ t is defined and on the functional form of f . For example, the nonlinear autoregressive, moving-average, exoge- nous input NARMAX(p, q, r ) process has Ψ t p J t q ε t + N r I t and represents the time series decomposition, J t = δ + f ϕ 1 J t-1 , ··· p J t-p 1 ε t-1 , ··· q ε t-q , (3) η 1 I t-1 , ··· r I t-r + ε t . (4) The table below shows the complete taxonomy of input-output models we have identified with this class. Armed with input-output relations, we now construct neural networks to model f . Function Autoregression order Moving-Average order Inputs Model 1 1 0 0 AR(1)=Random walk 1 p 0 0 AR(p) 1 p 0 r ARX(p, r) 1 0 q 0 MA(q ) 1 0 0 MA()=Wold Decomposition 1 0 q r MAX(q,r) 1 p q 0 ARMA(p, q ) 1 p q r ARMAX(p, q, r) f p 0 0 NAR(p) f p 0 r NARX(p, r) f 0 q 0 NMA(q ) f 0 q r NMAX(q,r) f p q 0 NARMA(p, q ) f p q r NARMAX(p, q, r) STEP 2: Equivalence between input-output models and Volterra Networks Equivalence between any NARIMAX(p, d 1 ,q,d 2 ,r ) process and its Volterra network representation is guaranteed by a special combi- nation of 2 Theorems: 1. Takens’ Theorem[8]: There is a 1-to-1 mapping between a time series and the underlying dynamical state space 2. Universality Theorem[3]: Nonlinear Multilayer Perceptrons are universal and exact function approximators STEP 3: Construction of Volterra networks Feedforward MLPs with lagged inputs create short-term memory and incorporate nonlinear dynamics into the network state space. In the case that neural activation functions are linear then they operate as finite impulse-response (FIR) networks [13]. Linear FIR models already exist in the literature [11]. Here we develop non- linear (sigmoidal activation function) time-delay FIR networks (Volterra networks) based on the NARMAX(p, 0,r ) process, whose general architecture is shown in Figure 2. In this poster we will show initial results of nonlinear autoregressive modelling of the electron flux based on the NAR(p) process as as example of the network operation and capability. FIG. 2. A schematic diagram of the nonlinear time-delay (Volterra) network used in this work. In order to measure the degree of success in reproducing observed values J (t) from the network model ˆ J (t), we used the data-model correlation coefficient C : C = 1 T 1 σ J σ ˆ J T 0 ˆ J (t) - ˆ J (t) J (t) -J (t) dt (5) where J (t) and σ J are the mean and standard deviation of J (t). STEP 4: Identification of the physical model Since the neuron activation function and weights (connections) in Volterra networks are extractable and therefore explicit, the net- work architecture can be converted into equations with known AR and MA coefficients thus providing model equations. In the Results section, we present a spatio-temporal model of the electron flux cal- culated with a NAR(30) process. The 30 lag steps in the FIR filter makes it impractical to write down the resulting model equations here, but Figures 3b), 4) and 5b) suggest that the simulated model is reproducing fairly well, the physics. Discussion All Volterra networks were trained with the Levenberg-Marquardt backpropagation algorithm [7] for 100 epochs and 10 adaptive passes at each step in L-shell altitude (0.1 Earth radii) and over daily-averaged data covering the whole of 1995. These early re- sults suggest that this particular modelling approach is capable of recovering the nonlinear dynamics implicit in the data. References [1] Baker D-N, McPherron R-L, Cayton T-E et al (1990) Linear prediction filter analysis of relativistic electron properties at 6.6 RE. JGR 95:15,133. [2] Baker D-N (2002) How to cope with space weather. Science 297:1486-1487. [3] Hornik J, Stinchombe M, White H (1989) Mutlilayer feedforward networks are universal approximators. Neural Networks 2:359-366. [4] Li X, Baker D-N, Kanekal S-G et al (2001) Long term measurements of radiation belts by SAMPEX and their variations. Geophys Res Lett 28(20):3827-3830. [5] Nagai T (1988) Space weather forecast: prediction of relativistic electron intensity at synchronous orbit. Geophys Res Lett 15(5):425-428. [6] Rigler E-J, Baker D-N (2008) A state-space model of radiation belt electron flux dynamics. J Atmos Solar Terr Phys 70:1797-1809. [7] Rumelhart D-E, McClelland J-L (1986) Parallel distributed processing: explorations in the microstructure of cognition. MIT Press (Cambridge, MA). [8] Takens P (1981) Detecting strange attractors in fluid turbulence, In: Rand D and Young L-S (1981) Dynamical Systems and Turbulence. Springer (Berlin) 898:366-381. [9] Taylor M et al (2009) A taxonomy of input-output models for the radiation belt enegetic electron flux driven by geoeffective parameters. (in preparation). [10] Taylor M, Daglis I-A, Anastasiadis A, Vassiliadis D, Vlahos L (2009) Nonlinear impulse-response net- works for the prediction of radiation belt energetic electron flux. Proc MCNPP Halkidiki, Greece, 15-19 June. [11] Vassiliadis D, Fung SF, Klimas AJ (2005) Solar, interplanetary and magnetospheric parameters for the radiation belt energetic electron flux. J Geophys Res 110:1-12. [12] Vassiliadis D (2007) Forecasting space weather, In: Bothmer V, Daglis I-A (eds) (2007) Space weather: physics and effects. Springer-Praxis (Berlin). [13] Wan E-A (1993) FIR neural networks for autoregressive time series prediction, In: Weigand A, Gershen- feld N (eds) (1993) Proceedings of the NATO advanced workshop on time series prediction and analysis (Santa Fe, NM). Addison-Wesley. [14] Wold H (1954) A Study in the Analysis of Stationary Time Series. Almqvist and Wiksell (Uppsala). Results FIG. 3. The raw dataset comprises daily-averaged values of 1-6MeV electron flux J e over the time interval 01/01/1993- 31/12/2001. The data for 1995 used in this study is indicated. Note that SAMPEX-PET data is available only above L=1.1. FIG. 4. a) The raw electron flux data normalised to the interval [-1,1] and b) initial results from the nonlinear autoregression NAR(30) Volterra network revealing fairly successful modelling of the salient features in this time interval. FIG. 5. The impulse-response function obtained directly from the Volterra network neuron weightings. The main impulse-response region is between L=2.7 and L=4.2 and has a duration of approximately 23 days. FIG. 6. a) The first zero crossing of the autocorrelation function calculated from the raw data is at a lag of 23 days, b) apart from a numerical artifact at the data edge (<L =1.5), the data-model correlation function C peaks in the main impulse-response region, c) the raw data and the network model at the peak of the impulse response at L =3.6. Acknowledgements MT thanks ISARS-NOA for their hospitality and the Greek State Scholarship Foundation (IKY) for financial support.

Upload: others

Post on 22-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification of nonlinear space weather models of the Van ...ϕ1 Jt−1,··· ,ϕpJt−p,φ1εt−1,··· ,φqεt−q, (3) η1 It−1,··· ,ηrIt−r i +εt. (4) The table below

Identification of nonlinear space weather models of theVan Allen radiation belts using Volterra networks

M. Taylor1∗, I.A. Daglis1, A. Anastasiadis1 and D. Vassiliadis2

1Institute for Space Applications and Remote Sensing(ISARS), National Observatory of Athens(NOA), Metaxa and Vasillis Pavlou Street, Penteli, Athens 15236, Greece.2Department of Physics, Hodges Hall, PO Box 6315, West Virginia University, Morgantown, WV 26506-6315, USA.

email∗: [email protected]

Introduction

Many efforts have been made to develop general dynamical models of the Van Allen radiation belts based

on data alone. Early linear prediction filter studies focused on the response of daily-averaged relativistic

electrons at geostationary altitudes[5, 1]. Vassiliadis et al (2005)[11] extended this technique spatially by

incorporating SAMPEX electron flux data into linear prediction filters for a broad range of L-shells from 1.1

to 10.0 RE. Nonlinear state space models[6] have provided useful initial results on the timescales involved in

modelling the impulse-reponse of the radiation belts. Here, we show how NARMAX models, in conjunction

with nonlinear time-delay FIR neural networks (Volterra networks) hold great promise for the development

of accurate and fully data-derived space weather specification and forecast tools.

Theory

FIG. 1. The overall methodology we have adopted is based on using NARMAX models to then contruct equivalent

nonlinear Volterra neural networks that benefit from Takens’ Theorem for time-delay embedding combined with the ability of

multilayer perceptrons to be universal function approximators.

STEP 1: Constructing a taxonomy of nonlinear input-output models

We began with a generalisation of the Wold time series decompo-sition [14] having the form,

Jt = δ + f (Ψt) + εt ≡ Jt + εt (1)

where Jt is the electron flux time series, Jt = δ + f (Ψt) are themodel predictions, f is a general function (linear or nonlinear), δis a constant (zero in the absence of trend), εt = Jt − Jt are theprediction errors and Ψt is an “information matrix” constructedfrom lag polynomials Φp, Θq, Nr, lag operators Li, Lj, Lk anddifferencing indices d1, d2 and which has the general form [10],

Ψt = Φp(1− Li)d1Jt + Θqεt + Nr(1− Lk)d2It, (2)

corresponding to a Nonlinear AutoRegressive Integrated Moving-Average eXogenous input NARIMAX(p, d1, q, d2, r) process. Notethat in the case of multiple inputs, It will be a vector It. The in-formation matrix then contains operators acting on time-delayed(lagged) time series of electron flux Jt−p (autoregression AR),lagged equation errors εt−q (moving-average MA), and lagged in-puts It−r (eXogenous). The particular class of model chosen de-pends on how exactly Ψt is defined and on the functional form of f .For example, the nonlinear autoregressive, moving-average, exoge-nous input NARMAX(p, q, r) process has Ψt = ΦpJt+Θqεt+NrItand represents the time series decomposition,

Jt = δ + f[

ϕ1 Jt−1, · · · , ϕpJt−p, φ1εt−1, · · · , φqεt−q, (3)

η1 It−1, · · · , ηrIt−r

]+ εt. (4)

The table below shows the complete taxonomy of input-outputmodels we have identified with this class. Armed with input-outputrelations, we now construct neural networks to model f .

Function Autoregression order Moving-Average order Inputs Model

1 1 0 0 AR(1)=Random walk1 p 0 0 AR(p)1 p 0 r ARX(p, r)1 0 q 0 MA(q)1 0 ∞ 0 MA(∞)=Wold Decomposition1 0 q r MAX(q, r)1 p q 0 ARMA(p, q)1 p q r ARMAX(p, q, r)f p 0 0 NAR(p)f p 0 r NARX(p, r)f 0 q 0 NMA(q)f 0 q r NMAX(q, r)f p q 0 NARMA(p, q)f p q r NARMAX(p, q, r)

STEP 2: Equivalence between input-output modelsand Volterra Networks

Equivalence between any NARIMAX(p, d1, q, d2, r) process and itsVolterra network representation is guaranteed by a special combi-nation of 2 Theorems:

1. Takens’ Theorem[8]: There is a 1-to-1 mapping between a timeseries and the underlying dynamical state space

2. Universality Theorem[3]: Nonlinear Multilayer Perceptrons areuniversal and exact function approximators

STEP 3: Construction of Volterra networks

Feedforward MLPs with lagged inputs create short-term memoryand incorporate nonlinear dynamics into the network state space.In the case that neural activation functions are linear then theyoperate as finite impulse-response (FIR) networks [13]. Linear FIRmodels already exist in the literature [11]. Here we develop non-linear (sigmoidal activation function) time-delay FIR networks(Volterra networks) based on the NARMAX(p, 0, r) process, whosegeneral architecture is shown in Figure 2. In this poster we will showinitial results of nonlinear autoregressive modelling of the electronflux based on the NAR(p) process as as example of the networkoperation and capability.

FIG. 2. A schematic diagram of the nonlinear time-delay (Volterra) network used in this work.

In order to measure the degree of success in reproducing observedvalues J(t) from the network model J(t), we used the data-modelcorrelation coefficient C:

C =1

T

1

σJσJ

∫ T

0

(J(t)−

⟨J(t)

⟩ )(J(t)− 〈J(t)〉

)dt (5)

where 〈J(t)〉 and σJ are the mean and standard deviation of J(t).

STEP 4: Identification of the physical model

Since the neuron activation function and weights (connections) inVolterra networks are extractable and therefore explicit, the net-work architecture can be converted into equations with known ARand MA coefficients thus providing model equations. In the Resultssection, we present a spatio-temporal model of the electron flux cal-culated with a NAR(30) process. The 30 lag steps in the FIR filtermakes it impractical to write down the resulting model equationshere, but Figures 3b), 4) and 5b) suggest that the simulated modelis reproducing fairly well, the physics.

Discussion

All Volterra networks were trained with the Levenberg-Marquardtbackpropagation algorithm [7] for 100 epochs and 10 adaptivepasses at each step in L-shell altitude (0.1 Earth radii) and overdaily-averaged data covering the whole of 1995. These early re-sults suggest that this particular modelling approach is capable ofrecovering the nonlinear dynamics implicit in the data.

References

[1] Baker D-N, McPherron R-L, Cayton T-E et al (1990) Linear prediction filter analysis of relativisticelectron properties at 6.6 RE. JGR 95:15,133.

[2] Baker D-N (2002) How to cope with space weather. Science 297:1486-1487.

[3] Hornik J, Stinchombe M, White H (1989) Mutlilayer feedforward networks are universal approximators.Neural Networks 2:359-366.

[4] Li X, Baker D-N, Kanekal S-G et al (2001) Long term measurements of radiation belts by SAMPEX andtheir variations. Geophys Res Lett 28(20):3827-3830.

[5] Nagai T (1988) Space weather forecast: prediction of relativistic electron intensity at synchronous orbit.Geophys Res Lett 15(5):425-428.

[6] Rigler E-J, Baker D-N (2008) A state-space model of radiation belt electron flux dynamics. J AtmosSolar Terr Phys 70:1797-1809.

[7] Rumelhart D-E, McClelland J-L (1986) Parallel distributed processing: explorations in the microstructureof cognition. MIT Press (Cambridge, MA).

[8] Takens P (1981) Detecting strange attractors in fluid turbulence, In: Rand D and Young L-S (1981)Dynamical Systems and Turbulence. Springer (Berlin) 898:366-381.

[9] Taylor M et al (2009) A taxonomy of input-output models for the radiation belt enegetic electron fluxdriven by geoeffective parameters. (in preparation).

[10] Taylor M, Daglis I-A, Anastasiadis A, Vassiliadis D, Vlahos L (2009) Nonlinear impulse-response net-works for the prediction of radiation belt energetic electron flux. Proc MCNPP Halkidiki, Greece, 15-19June.

[11] Vassiliadis D, Fung SF, Klimas AJ (2005) Solar, interplanetary and magnetospheric parameters for theradiation belt energetic electron flux. J Geophys Res 110:1-12.

[12] Vassiliadis D (2007) Forecasting space weather, In: Bothmer V, Daglis I-A (eds) (2007) Space weather:physics and effects. Springer-Praxis (Berlin).

[13] Wan E-A (1993) FIR neural networks for autoregressive time series prediction, In: Weigand A, Gershen-feld N (eds) (1993) Proceedings of the NATO advanced workshop on time series prediction and analysis(Santa Fe, NM). Addison-Wesley.

[14] Wold H (1954) A Study in the Analysis of Stationary Time Series. Almqvist and Wiksell (Uppsala).

Results

FIG. 3. The raw dataset comprises daily-averaged values of 1-6MeV electron flux Je over the time interval 01/01/1993-

31/12/2001. The data for 1995 used in this study is indicated. Note that SAMPEX-PET data is available only above

L=1.1.

FIG. 4. a) The raw electron flux data normalised to the interval [-1,1] and b) initial results from the nonlinear

autoregression NAR(30) Volterra network revealing fairly successful modelling of the salient features in this time interval.

FIG. 5. The impulse-response function obtained directly from the Volterra network neuron weightings. The main

impulse-response region is between L=2.7 and L=4.2 and has a duration of approximately 23 days.

FIG. 6. a) The first zero crossing of the autocorrelation function calculated from the raw data is at a lag of 23 days,

b) apart from a numerical artifact at the data edge (< L = 1.5), the data-model correlation function C peaks in the main

impulse-response region, c) the raw data and the network model at the peak of the impulse response at L = 3.6.

Acknowledgements

MT thanks ISARS-NOA for their hospitality and the Greek State Scholarship Foundation (IKY) for financial

support.