samplingtechniquesforcomputational statisticalphysicsjin/ps/eacm.pdf · 1edinburgh university...

S

Sampling Techniques for ComputationalStatistical Physics

Benedict Leimkuhler1 and Gabriel Stoltz21Edinburgh University School of Mathematics,Edinburgh, Scotland, UK2Universite Paris Est, CERMICS, Projet MICMACEcole des Ponts, ParisTech – INRIA, Marne-la-Vallee,France

Mathematics Subject Classification

82B05; 82-08; 65C05; 37M05

Short Definition

The computation of macroscopic properties, as pre-dicted by the laws of statistical physics, requires sam-pling phase-space configurations distributed accordingto the probability measure at hand. Typically, approx-imations are obtained as time averages over trajecto-ries of discrete dynamics, which can be shown to beergodic in some cases. Arguably, the greatest interestis in sampling the canonical (constant temperature)ensemble, although other distributions (isobaric, mi-crocanonical, etc.) are also of interest. Focusing onthe case of the canonical measure, three importanttypes of methods can be distinguished: (1) Markovchain methods based on the Metropolis–Hastings algo-rithm; (2) discretizations of continuous stochastic dif-ferential equations which are appropriate modifications

and/or limiting cases of the Hamiltonian dynamics;and (3) deterministic dynamics on an extended phasespace.

Description

Applications of sampling methods arise most com-monly in molecular dynamics and polymer modeling,but they are increasingly encountered in fluid dynamicsand other areas. In this article, we focus on the treat-ment of systems of particles described by position andmomentum vectors q and p, respectively, and modeledby a Hamiltonian energyH D H.q; p/.

Macroscopic properties of materials are obtained,according to the laws of statistical physics, as theaverage of some function with respect to a proba-bility measure � describing the state of the system(�Calculation of Ensemble Averages):

E�.A/ DZEA.q; p/�.dq dp/: (1)

In practice, averages such as (1) are obtained by gener-ating, by an appropriate numerical method, a sequenceof microscopic configurations .qi ; pi /i�0 such that

limn!C1

1

n

n�1XiD0

A.qi ; pi / DZEA.q; p/�.dq dp/: (2)

The Canonical CaseFor simplicity, we consider the case of the canonicalmeasure:

© Springer-Verlag Berlin Heidelberg 2015B. Engquist (ed.), Encyclopedia of Applied and Computational Mathematics,DOI 10.1007/978-3-540-70529-1

http://dx.doi.org/10.1007/978-3-540-70529-1_265

1288 Sampling Techniques for Computational Statistical Physics

�.dq dp/ D Z�1� e�ˇH.q;p/ dq dp;

Z� DZE

e�ˇH.q;p/ dq dp; (3)

where ˇ�1 D kBT . Many sampling methods designedfor the canonical ensemble can be extended or adaptedto sample other ensembles.

If the Hamiltonian is separable (i.e., it is the sumof a quadratic kinetic energy and a potential energy),as is usually the case when Cartesian coordinates areused, the measure (3) has a tensorized form, and thecomponents of the momenta are distributed accord-ing to independent Gaussian distributions. It is there-fore straightforward to sample the kinetic part of thecanonical measure. The real difficulty consists in sam-pling positions distributed according to the canonicalmeasure

�.dq/ D Z�1� e�ˇV.q/ dq; Z� D

ZD

e�ˇV.q/ dq;(4)

which is typically a high dimensional distribution, withmany local concentrated modes. For this reason, manysampling methods focus on sampling the configura-tional part � of the canonical measure.

Since most concepts needed for sampling purposescan be used either in the configuration space or in thephase space, the following notation will be used: Thestate of the system is denoted by x 2 S � R

d , whichcan be the position space q 2 D (and then d D 3N ),or the full phase space .q; p/ 2 E with d D 6N .The measure �.dx/ is the canonical distribution to besampled (� in configuration space, � in phase space).

General ClassificationFrom a mathematical point of view, most samplingmethods may be classified as (see [2]):1. “Direct” probabilistic methods, such as the standard

rejection method, which generate identically andindependently distributed (i.i.d) configurations

2. Markov chain techniques3. Markovian stochastic dynamics4. Purely deterministic methods on an extended phase-

spaceAlthough the division described above is useful to bearin mind, there is a blurring of the lines between the dif-ferent types of methods used in practice, with Markovchains being constructed from Hamiltonian dynamics

or degenerate diffusive processes being added to deter-ministic models to improve sampling efficiencies.

Direct probabilistic methods are typically based ona prior probability measure used to sample configura-tions, which are then accepted or rejected accordingto some criterion (as for the rejection method, forinstance). Usually, a prior probability measure whichis easy to sample should be used. However, due tothe high dimensionality of the problem, it is extremelydifficult to design a prior sufficiently close to thecanonical distribution to achieve a reasonable accep-tance rate. Direct probabilistic methods are thereforerarely used in practice.

Markov Chain MethodsMarkov chain methods are mostly based on theMetropolis–Hastings algorithm [5, 13], which is awidely used method in molecular simulation. The priorrequired in direct probabilistic methods is replaced bya proposal move which generates a new configurationfrom a former one. This new configuration is thenaccepted or rejected according to a criterion ensuringthat the correct measure is sampled. Here again,designing a relevant proposal move is the cornerstoneof the method, and this proposal depends crucially onthe model at hand.

The Metropolis–Hastings AlgorithmThe Metropolis–Hastings algorithm generates aMarkov chain of the system configurations .xn/n�0having the distribution of interest �.dx/ as a stationarydistribution. The invariant distribution � has to beknown only up to a multiplicative constant to performthis algorithm (which is the case for the canonicalmeasure and its marginal in position). It consists ina two-step procedure, starting from a given initialcondition x0:1. Propose a new state QxnC1 from xn according to the

proposition kernel T .xn; �/2. Accept the proposition with probability min�

1;�. QxnC1/ T . QxnC1; xn/�.xn/ T .xn; QxnC1/

�, and set in this case

xnC1 D QxnC1; otherwise, set xnC1 D xn

It is important to count several times a configurationwhen a proposal is rejected.

The original Metropolis algorithm was proposedin [13] and relied on symmetric proposals in the config-uration space. It was later extended in [5] to allow fornonsymmetric propositions which can bias proposals

Sampling Techniques for Computational Statistical Physics 1289

S

toward higher probability regions with respect to thetarget distribution � . The algorithm is simple to in-terpret in the case of a symmetric proposition kernelon the configuration space (�.x/ / e�ˇV.q/ andT .q; q0/ D T .q0; q/). The Metropolis–Hastings ratiois simply

r.q; q0/ D exp��ˇ.V.q0/� V.q//

�:

If the proposed move has a lower energy, it is alwaysaccepted, which allows to visit more frequently thestates of higher probability. On the other hand, tran-sitions to less likely states of higher energies are notforbidden (but accepted less often), which is importantto observe transitions from one metastable region toanother when these regions are separated by someenergy barrier.

Properties of the AlgorithmThe probability transition kernel of the Metropolis–Hastings chain reads

P.x; dx0/ D min�1; r.x; x0/

�T .x; dx0/

C .1 � ˛.x// ıx.dx0/; (5)

where ˛.x/ 2 Œ0; 1� is the probability to accept a movestarting from x (considering all possible propositions):

˛.x/ DZS

min .1; r.x; y// T .x; dy/:

The first part of the transition kernel corresponds tothe accepted transitions from x to x0, which occurwith probability min .1; r.x; x0//, while the term .1 �˛.x//ıx.dx

0/ encodes all the rejected steps.A simple computation shows that the Metropolis–

Hastings transition kernel P is reversible with respectto � , namely, P.x; dx0/�.dx/ D P.x0; dx/�.dx0/.This implies that the measure � is an invariant mea-sure. To conclude to the pathwise ergodicity of the al-gorithm (2) (relying on the results of [14]), it remains tocheck whether the chain is (aperiodically) irreducible,i.e., whether any state can be reached from any otherone in a finite number of steps. This property dependson the proposal kernel T , and should be checked forthe model under consideration.

Besides determining the theoretical convergenceof the algorithm, the proposed kernel is also a key

element in devising efficient algorithms. It is observedin practice that the optimal acceptance/rejection rate,in terms of the variance of the estimator (a mean ofsome observable over a trajectory), for example, isoften around 0.5, ensuring some balance between:• Large moves that decorrelate the iterates when they

are accepted (hence reducing the correlations in thechain, which is interesting for the convergence tohappen faster) but lead to high rejection rates (andthus degenerate samples since the same positionmay be counted several times)

• And small moves that are less rejected but do notdecorrelate the iterates much

This trade-off between small and large proposal moveshas been investigated rigorously in some simple casesin [16,17], where optimal acceptance rates are obtainedin a limiting regime.

Some Examples of Proposition KernelsThe most simple transition kernels are based on ran-dom walks. For instance, it is possible to modify thecurrent configuration by a random perturbation appliedto all particles. The problem with such symmetricproposals is that they may not be well suited to thetarget probability measure (creating very correlatedsuccessive configurations for small � , or very unlikelymoves for large �). Efficient nonsymmetric proposalmoves are often based on discretizations of continuousstochastic dynamics which use a biasing term such as�rV to ensure that the dynamics remains sufficientlyclose to the minima of the potential.

An interesting proposal relies on the Hamiltoniandynamics itself and consists in (1) sampling new mo-menta pn according to the kinetic part of the canon-ical measure; (2) performing one or several steps ofthe Verlet scheme starting from the previous positionqn, obtaining a proposed configuration .eqnC1;epnC1/;and (3) computing rn D expŒ�ˇ.H.eqnC1;epnC1/ �H.qn; pn//� and accepting the new position qnC1 withprobability min.1; rn/. This algorithm is known asthe Hybrid Monte Carlo algorithm (first introducedin [3] and analyzed from a mathematical viewpointin [2, 20]).

A final important example is parallel temperingstrategies [10], where several replicas of the systemare simulated in parallel at different temperatures, andsometimes exchanges between two replicas at differenttemperatures are attempted, the probability of such anexchange being given by a Metropolis–Hastings ratio.

1290 Sampling Techniques for Computational Statistical Physics

Continuous Stochastic DynamicsA variety of stochastic dynamical methods are in usefor sampling the canonical measure. For simplicity ofexposition, we consider here systems with the N -bodyHamiltonianH D pTM�1p=2C V.q/.

Brownian DynamicsBrownian dynamics is a stochastic dynamics on theposition variable q 2 D only:

dqt D �rV.qt / dt Cs2

ˇdWt ; (6)

where Wt is a standard 3N -dimensional Wiener pro-cess. It can be shown that this system is an ergodic pro-cess for the configurational invariant measure �.dq/ DZ�1� exp.�ˇV.q// dq, the ergodicity following from

the elliptic nature of the generator of the process. Thedynamics (6) may be solved numerically using theEuler–Maruyama scheme:

qnC1 D qn ��trV.qn/Cs2�t

ˇGn; (7)

where the .Gn/n�0 are independent and identicallydistributed (i.i.d.) centered Gaussian random vectors inR3N with identity covariance matrix E .Gn ˝Gn/ D

Id3N . Although the discretization scheme does not ex-actly preserve the canonical measure, it can be shownunder certain boundedness assumptions (see [12, 21]),that the numerical scheme is ergodic, with an invariantprobability close to the canonical measure � in asuitable norm. The numerical bias may be eliminatedusing a Metropolis rule, see, e.g., [16, 18].

Langevin DynamicsHamiltonian dynamics preserve the energy, while asampling of the canonical measure requires visiting allthe energy levels. Langevin dynamics is a model of aHamiltonian system coupled with a heat bath, definedby the following equations:

(dqt DM�1pt dt;

dpt D � rV.qt / dt � �.qt /M�1pt dt C �.qt / dWt ;(8)

where Wt is a 3N -dimensional standard Brownianmotion, and � and � are (possibly position dependent)

3N � 3N real matrices. The term �.qt / dWt is a fluc-tuation term bringing energy into the system, this en-ergy being dissipated through the viscous friction term��.qt /M�1pt dt . The canonical measure is preservedprecisely when the “fluctuation-dissipation” relation��T D 2�

ˇis satisfied. Many variants and extensions

of Langevin dynamics are available.Using the Hormander conditions, it is possible to

demonstrate ergodicity of the system provided �.q/has full rank (i.e., a rank equal to 3N ) for all q inposition space. A spectral gap can also be demonstratedunder appropriate assumptions on the potential energyfunction, relying on recent advances in hypocoercivity[22] or thanks to Lyapunov techniques.

Brownian motion may be viewed as either the non-inertial limit (m ! 0) of the Langevin dynamics, orits overdamped limit (� ! 1) with a different time-scaling.

The discretization of stochastic differential equa-tions, such as Langevin dynamics, is still a topic ofresearch. Splitting methods, which divide the systeminto deterministic and stochastic components, are in-creasingly used for this purpose. As an illustration,one may adopt a method whereby Verlet integration issupplemented by an “exact” treatment of the Ornstein–Uhlenbeck process, replacing

dpt D ��.qt /M�1pt dt C �.qt / dWt

by a discrete process that samples the associated Gaus-sian distribution. In some cases, it is possible to showthat such a method is ergodic.

Numerical discretization methods for Langevin dy-namics may be corrected in various ways to exactlypreserve the canonical measure, using the Metropo-lis technique [5, 13] (see, e.g., the discussion in [9],Sect. 2.2).

Deterministic Dynamics on Extended PhaseSpacesIt is possible to modify Hamiltonian dynamics by theaddition of control laws in order to sample the canoni-cal (or some other) distribution. The simplest exampleof such a scheme is the Nose–Hoover method [6, 15]which replaces Newton’s equations of motion by thesystem:

Sampling Techniques for Computational Statistical Physics 1291

S

Pq D M�1p;

Pp D �rV.q/ � p;P D Q�1.pTM�1p �NkBT /;

where Q > 0 is a parameter. It can be shownthat this dynamics preserves the product distributione�ˇH.q;p/e�ˇQ2=2 as a stationary macrostate. It is, insome cases (e.g., when the underlying system is linear),not ergodic, meaning that the invariant distribution isnot unique [7]. Nonetheless, the method is still popularfor sampling calculations. The best arguments forits continued success, which have not been foundedrigorously yet, are that (a) molecular systems typicallyhave large phase spaces and may incorporate liquidsolvent, steep potentials, and other mechanisms thatprovide a strong internal diffusion property or (b)any inaccessible regions in phase space may notcontribute much to the averages of typical quantities ofinterest.

The accuracy of sampling can sometimes be im-proved by stringing together “chains” of additionalvariables [11], but such methods may introduce addi-tional and unneeded complexity (especially as there aremore reliable alternatives, see below). When ergodicityis not a concern (e.g., when a detailed atomistic modelof water is involved), an alternative to the Nose–Hoover method is to use the Nose–Poincare method [1]which is derived from an extended Hamiltonian andwhich allows the use of symplectic integrators (pre-serving phase space volume and, approximately, en-ergy, and typically providing better long-term stability;see �Molecular Dynamics).

Hybrid Methods by Stochastic ModificationWhen ergodicity is an issue, it is possible to enhanceextended dynamics methods by the incorporation ofstochastic processes, for example, as defined by the ad-dition of Ornstein–Uhlenbeck terms. One such methodhas been proposed in [19]. It replaces the Nose–Hooversystem by the highly degenerate stochastic system:

dqt D M�1pt dt;

dpt D .�rV.qt /� pt/ dt;

dt D�Q�1

�pTt M

�1pt � N

ˇ

�� dt C

s2�

ˇQdWt;

which incorporates only a scalar noise process. Thismethod has been called the Nose–Hoover–Langevinmethod in [8], where also ergodicity was proved in thecase of an underlying harmonic system (V quadratic)under certain assumptions. A similar technique, the“Langevin Piston” [4] has been suggested to control thepressure in molecular dynamics, where the samplingis performed with respect to the NPT (isobaric–isothermal) ensemble.

References

1. Bond, S., Laird, B., Leimkuhler, B.: The Nose-Poincaremethod for constant temperature molecular dynamics.J. Comput. Phys. 151, 114–134 (1999)

2. Cances, E., Legoll, F., Stoltz, G.: Theoretical and numericalcomparison of sampling methods for molecular dynamics.Math. Model. Numer. Anal. 41(2), 351–390 (2007)

3. Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.:Hybrid Monte-Carlo. Phys. Lett. B 195(2), 216–222(1987)

4. Feller, S., Zhang, Y., Pastor, R., Brooks, B.: Constant pres-sure molecular dynamics simulation: the langevin pistonmethod. J. Chem. Phys. 103(11), 4613–4621 (1995)

5. Hastings, W.K.: Monte Carlo sampling methods usingMarkov chains and their applications. Biometrika 57,97–109 (1970)

6. Hoover, W.: Canonical dynamics: equilibrium phase spacedistributions. Phys. Rev. A 31, 1695–1697 (1985)

7. Legoll, F., Luskin, M., Moeckel, R.: Non-ergodicity ofthe Nose-Hoover thermostatted harmonic oscillator. Arch.Ration. Mech. Anal. 184, 449–463 (2007)

8. Leimkuhler, B., Noorizadeh, N., Theil, F.: A gentle stochas-tic thermostat for molecular dynamics. J. Stat. Phys. 135(2),261–277 (2009)

9. Lelievre, T., Rousset, M., Stoltz, G.: Free Energy Computa-tions: a Mathematical Perspective. Imperial College Press,London/Hackensack (2010)

10. Marinari, E., Parisi, G.: Simulated tempering – a newMonte-Carlo scheme. Europhys. Lett. 19(6), 451–458(1992)

11. Martyna, G., Klein, M., Tuckerman, M.: Nose-Hooverchains: the canonical ensemble via continuous dynamics.J. Chem. Phys. 97(4), 2635–2643 (1992)

12. Mattingly, J.C., Stuart, A.M., Higham, D.J.: Ergodicityfor SDEs and approximations: locally Lipschitz vectorfields and degenerate noise. Stoch. Process. Appl. 101(2),185–232 (2002)

13. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller,A.H., Teller, E.: Equations of state calculations by fastcomputing machines. J. Chem. Phys. 21(6), 1087–1091(1953)

14. Meyn, S.P., Tweedie, R.L.: Markov Chains and StochasticStability. Communications and Control Engineering Series.Springer, London/New York (1993)

15. Nose, S.: A molecular-dynamics method for simulationsin the canonical ensemble. Mol. Phys. 52, 255–268(1984)

http://dx.doi.org/10.1007/978-3-540-70529-1_56

1292 Schrodinger Equation for Chemistry

16. Roberts, G.O., Rosenthal, J.S.: Optimal scaling of discreteapproximations to Langevin diffusions. J. R. Stat. Soc. B 60,255–268 (1998)

17. Roberts, G.O., Gelman, A., Gilks, W.R.: Weak convergenceand optimal scaling of random walk Metropolis algorithms.Ann. Appl. Probab. 7, 110–120 (1997)

18. Rossky, P.J., Doll, J.D., Friedman, H.L.: Brownian dynam-ics as smart Monte Carlo simulation. J. Chem. Phys. 69,4628–4633 (1978)

19. Samoletov, A., Chaplain, M.A.J., Dettmann, C.P.: Ther-mostats for “slow” configurational modes. J. Stat. Phys. 128,1321–1336 (2007)

20. Schutte, C.: Habilitation thesis. Freie Universitat Berlin,Berlin (1999). http://publications.mi.fu-berlin.de/89/

21. Talay, D., Tubaro, L.: Expansion of the global error fornumerical schemes solving stochastic differential equations.Stoch. Anal. Appl. 8(4), 483–509 (1990)

22. Villani, C.: Hypocoercivity. Mem. Am. Math. Soc.202(950), 141 (2009)

Schrodinger Equation for Chemistry

Harry YserentantInstitut fur Mathematik, Technische UniversitatBerlin, Berlin, Germany


81-02; 81V55; 35J10

Short Definition

The Schrodinger equation forms the basis of nonrel-ativistic quantum mechanics and is fundamental forour understanding of atoms and molecules. The entrymotivates this equation and embeds it into the generalframework of quantum mechanics.

Description

IntroductionQuantum mechanics links chemistry to physics.Conceptions arising from quantum mechanics formthe framework for our understanding of atomic andmolecular processes. The history of quantum mechan-ics began around 1900 with Planck’s analysis of theblack-body radiation, Einstein’s interpretation ofthe photoelectric effect, and Bohr’s theory of the

hydrogen atom. A unified framework allowing fora systematic study of quantum phenomena arose,however, first in the 1920s. Starting point was deBroglie’s observation of the wave-like behavior ofmatter, finally resulting in the Schrodinger equation[8] and [3] for the multiparticle case. The purpose ofthis article is to motivate this equation from somebasic principles and to sketch at the same timethe mathematical structure of quantum mechanics.More information can be found in textbooks onquantum mechanics like Atkins and Friedman [1]or Thaller [11,12]. The first one is particularly devotedto the understanding of the molecular processesthat are important for chemistry. The second andthe third one more emphasize the mathematicalstructure and contain a lot of impressive visualizations.The monograph [4] gives an introduction to themathematical theory. A historically very interestingtext, in which the mathematically framework ofquantum mechanics has been established and whichwas at the same time a milestone in the developmentof spectral theory, is von Neumann’s seminal treatise[13]. The present exposition is largely taken fromYserentant [14].

The Schrodinger Equation of a Free ParticleLet us first recall the notion of a plane wave, a complex-valued function

Rd � R ! C W .x; t/ ! e ik�x�i!t ; (1)

with k 2 Rd the wave vector and ! 2 R the

frequency. A dispersion relation ! D !.k/ assignsto each wave vector a characteristic frequency. Suchdispersion relations fix the physics that is described bythis kind of waves. Most common is the case ! Dcjkj which arises, for example, in the propagation oflight in vacuum. When the wave nature of matter wasrecognized, the problem was to guess the dispersionrelation for the matter waves: to guess, as this hy-pothesis creates a new kind of physics that cannot bededuced from known theories. A good starting pointis Einstein’s interpretation of the photoelectric effect.When polished metal plates are irradiated by light ofsufficiently short wave length they may emit electrons.The magnitude of the electron current is as expectedproportional to the intensity of the light source, buttheir energy surprisingly to the wave length or thefrequency of the incoming light. Einstein’s explanation

http://publications.mi.fu-berlin.de/89/

Schrodinger Equation for Chemistry 1293

S

was that light consists of single light quanta withenergy and momentum

E D „!; p D „k (2)

depending on the frequency ! and the wave vector k.The quantity

„ D 1:0545716 � 10�34 kg m2 s�1

is Planck’s constant, an incredibly small quantity ofthe dimension energy � time called action, reflectingthe size of the systems quantum mechanics deals with.To obtain from (2) a dispersion relation, Schrodingerstarted first from the energy-momentum relation ofspecial relativity, but this led by reasons not to bediscussed here to the wrong predictions. He thereforefell back to the energy-momentum relation

E D 1

2mjpj2

from classical, Newtonian mechanics. It leads to thedispersion relation

! D „2m

jkj2

for the plane waves (1). These plane waves can besuperimposed to wave packets

.x; t/ D 1p

2�

�3 Ze�i „

2m jkj2t b 0.k/ e ik�x dk: (3)

These wave packets are the solutions of the partialdifferential equation

i „ @

@tD � „2

2m� ; (4)

the Schrodinger equation for a free particle of mass min absence of external forces.

The Schrodinger equation (4) is of first order intime. Its solutions, the wavefunctions of free particles,are uniquely determined by their initial state 0. If 0is a rapidly decreasing function (in the Schwartz space)the solution possesses time derivatives of arbitraryorder, and all of them are rapidly decreasing functionsof the spatial variables. To avoid technicalities, weassume this for the moment. We further observe that

Zj .x; t/j2 dx D

Zjb .k; t/j2 dk

remains constant in time. This follows from Plancherel’stheorem, a central result of Fourier analysis. Weassume in the sequel that this value is normalizedto 1, which is basic for the statistical interpretationof the wavefunctions . The quantities j j2 and jb j2can then be interpreted as probability densities. Theintegrals

Z˝

j .x; t/j2 dx;Zb jb .k; t/j2 dk

represent the probabilities to find the particle at time tin the region ˝ of the position space, respectively, theregion b of the momentum space. The quantity

Z „22m

jkj2 jb .k; t/j2 dk;

is the expectation value of the kinetic energy. With helpof the Hamilton operator

H D � „22m

�; (5)

this expectation value can be rewritten as

Z H dx D . ;H /:

The expectation values of the components of the mo-mentum are in vector notation

Z„k jb .k; t/j2 dk:

Introducing the momentum operator

p D � i „ r (6)

their position representation is the inner productZ p dx D . ; p /:

The expectation values of the three components of theparticle position are finally

Zx j .x; t/j2 dx D . ; q /;


with q the position operator given by ! x .This coincidence between observable physical quanti-ties like energy, momentum, or position and operatorsacting upon the wavefunctions is in no way accidental.It forms the heart of quantum mechanics.

The Mathematical Framework of QuantumMechanicsWe have seen that the physical state of a free particle ata given time t is completely determined by a functionin the Hilbert space L2 that again depends uniquely onthe state at a given initial time. In the case of moregeneral systems, the space L2 is replaced by anotherHilbert space, but the general concept remains:

Postulate 1. A quantum-mechanical system consists ofa complex Hilbert space H with inner product .�; �/ anda one-parameter group U.t/, t 2 R, of unitary linearoperators on H with

U.0/ D I; U.s C t/ D U.s/U.t/

that is strongly continuous in the sense that for all 2H in the Hilbert space norm

limt!0

U.t/ D :

A state of the system corresponds to a normalized vec-tor in H. The time evolution of the system is describedby the group of the propagators U.t/; the state

.t/ D U.t/ .0/ (7)

of the system at time t is uniquely determined by itsstate at time t D 0.

In the case of free particles considered so far, thesolution of the Schrodinger equation and with thattime evolution is given by (3). The evolution operatorsU.t/, or propagators, read therefore in the Fourier ormomentum representation

b .k/ ! e�i „2m jkj2t b .k/:

Strictly speaking, they have first only been definedfor rapidly decreasing functions, functions in a densesubspace of L2, but it is obvious from Plancherel’stheorem that they can be uniquely extended from thereto L2 and have the required properties.

The next step is to move from Postulate 1 to an ab-stract version of the Schrodinger equation. For that we

have to establish a connection between such stronglycontinuous groups of unitary operators and abstractHamilton operators. Let D.H/ be the linear subspaceof the given system Hilbert space H that consists ofthose elements in H for which the limit

H D i „ lim!0

U./ � I

exists in the sense of norm convergence. The mapping ! H from the domain D.H/ into the Hilbertspace H is then called the generator H of the group.The generator of the evolution operator of the freeparticle is the operator

H D � „22m

� (8)

with the Sobolev space H2 as domain of definitionD.H/. In view of this observation, the following resultfor the general abstract case is unsurprising:

Theorem 1 For all initial values .0/ in the domainD.H/ of the generator of the group of the propagatorsU.t/, the elements (7) are contained in D.H/, too,depend continuously differentiable on t , and satisfy thedifferential equation

i „ d

dt .t/ D H .t/: (9)

It should be noted once more, however, that the dif-ferential (9), the abstract Schrodinger equation, makessense only for initial values in the domain of thegenerator H , but that the propagators are defined onthe whole Hilbert space.

A little calculation shows that the generators of one-parameter unitary groups are necessarily symmetric.More than that, they are even selfadjoint. There isa direct correspondence between unitary groups andselfadjoint operators, Stone’s theorem, a cornerstone inthe mathematical foundation of quantum mechanics:

Theorem 2 If U.t/, t 2 R, is a one-parameter unitarygroup as in Postulate 1, the domain D.H/ of itsgeneratorH is a dense subset of the underlying Hilbertspace and the generator itself selfadjoint. Every selfad-joint operator H is conversely the generator of such aone-parameter unitary group, that is usually denotedas

U.t/ D e� i„Ht :


S

Instead of the unitary group of the propagators, aquantum-mechanical system can be thus equivalentlyfixed by the generator H of this group, the Hamiltonoperator, or in the language of physics, the Hamiltonianof the system.

In our discussion of the free particle, we haveseen that there is a direct correspondence betweenthe expectation values of the energy, the momentum,and the position of the particle and the energy orHamilton operator (5), the momentum operator (6), andthe position operator x ! x . Each of these operatorsis selfadjoint. This reflects the general structure ofquantum mechanics:

Postulate 2. Observable physical quantities, or ob-servables, are in quantum mechanics represented byselfadjoint operators A W D.A/ ! H defined on densesubspaces D.A/ of the system Hilbert space H. Thequantity

hAi D . ;A / (10)

is the expectation value of a measurement of A for thesystem in state 2 D.A/.

At this point, we have to recall the statistical natureof quantum mechanics. Quantum mechanics does notmake predictions on the outcome of a single mea-surement of a quantity A but only on the mean resultof a large number of measurements on “identicallyprepared” states. The quantity (10) has thus to beinterpreted as the mean result that one obtains from alarge number of such measurements. This gives reasonto consider the standard deviation or uncertainty

�A D kA � hAi k

for states 2 D.A/. The uncertainty is zero if andonly if A D hAi , that is, if is an eigenvector ofA for the eigenvalue hAi. Only in such eigenstates thequantity represented by the operator A can be sharplymeasured without uncertainty. The likelihood that ameasurement returns a value outside the spectrum ofA is zero.

One of the fundamental results of quantum me-chanics is that, only in exceptional cases, can differentphysical quantities be measured simultaneously with-out uncertainty, the Heisenberg uncertainty principle.Its abstract version reads as follows:

Theorem 3 Let A and B two selfadjoint operatorsand let be a normalized state in the intersection of

D.A/ and D.B/ such that A 2 D.B/ and B 2D.A/. The product of the corresponding uncertaintiesis then bounded from below by

�A�B � 1

2j..BA � AB/ ; /j: (11)

The proof is an exercise in linear algebra. As anexample, we consider the components

qk D xk; pk D � i „ @

@xk

of the position and the momentum operator. Theircommutators are

qkpk � pkqk D i „ I:

This results in the Heisenberg uncertainty principle

�pk �qk � 1

2„: (12)

Position and momentum therefore can never bedetermined simultaneously without uncertainty,independent of the considered state of the system.The inequality (12) and with that also (11) are sharp asthe instructive example

.x/ D 1p

#

�3 0

x#

�

of the rescaled three-dimensional Gauss functions

0.x/ D 1p

�

�3=2exp

� 1

2jxj2

�

of arbitrary width demonstrates. For these wavefunc-tions, the inequality (12) actually turns into an equality.From b .k/ D .

p#/3 0.#k/

one recognizes that a sharp localization in space, thatis, a small parameter # determining the width of , iscombined with a loss of localization in momentum.

States with a well defined, sharp energy E play aparticularly important role in quantum mechanics, thatis, solutions ¤ 0 in H of the eigenvalue problem

H D E ;


the stationary Schrodinger equation. The functions

t ! e�iE„t

represent then solutions of the original time-dependentSchrodinger equation. The main focus of quantumchemistry is on stationary Schrodinger equations.

The Quantum Mechanics of MultiparticleSystemsLet us assume that we have a finite collection of Nparticles of different kind with the spaces L2.˝i / assystem Hilbert spaces. The Hilbert space describing thesystem that is composed of these particles is then thetensor product of these Hilbert spaces or a subspaceof this space, that is, a space of square integrablewavefunctions

W ˝1 � : : : �˝N ! C

with the N -tuples .1; : : : ; N /, i 2 ˝i , as arguments.From the point of view of mathematics, this is ofcourse another postulate that can in a strict sense notbe derived from anything else, but is motivated by thestatistical interpretation of the wavefunctions and ofthe quantity j j2 as a probability density. Quantum-mechanical particles of the same type, like electrons,can, however, not be distinguished from each otherby any means or experiment. This is both a physicalstatement and a mathematical postulate that needs tobe specified precisely. It has striking consequences forthe form of the physically admissible wavefunctionsand of the Hilbert spaces that describe such systemsof indistinguishable particles.

To understand these consequences, we have to recallthat an observable quantity like momentum or energyis described in quantum mechanics by a selfadjoint op-erator A and that the inner product . ;A / representsthe expectation value for the outcome of a measure-ment of this quantity in the physical state described bythe normalized wavefunction . At least a necessarycondition that two normalized elements or unit vectors and 0 in the system Hilbert space H describethe same physical state is surely that . ;A / D. 0; A 0/ for all selfadjoint operators A W D.A/ �H ! H whose domain D.A/ contains both and 0, that is, that the expectation values of all possi-ble observables coincide. This requirement fixes such

states almost completely. Wavefunctions that describethe same physical state can differ at most by a constantphase shift ! ei� , � a real number. Wavefunctionsthat differ by such a phase shift lead to the sameexpectation values of observable quantities. The proofis again an exercise in linear algebra. In view ofthis discussion, the requirements on the wavefunctionsdescribing a system of indistinguishable particles arerather obvious and can be formulated in terms of theoperations that formally exchange the single particles:

Postulate 3. The Hilbert space of a system of Nindistinguishable particles with system Hilbert spaceL2.˝/ consists of complex-valued, square integrablefunctions

W .1; : : : ; N / ! .1; : : : ; N /

on the N -fold cartesian product of ˝ , that is, is asubspace of L2.˝N /. For every in this space andevery permutation P of the arguments i , the function ! .P/ is also in this space, and moreover itdiffers from at most by a constant phase shift.

This postulate can be rather easily translated into asymmetry condition on the wavefunctions that governsthe quantum mechanics of multiparticle systems:

Theorem 4 The Hilbert space describing a system ofindistinguishable particles either consists completelyof antisymmetric wavefunctions, functions for which

.P/ D sign.P / ./

holds for all permutations P of the components1; : : : ; N of , that is, of the single particles, oronly of symmetric wavefunctions, wavefunctions forwhich

.P/ D ./

holds for all permutations P of the arguments.

Which of the two choices is realized depends solelyon the kind of particles and cannot be decided inthe present framework. Particles with antisymmetricwavefunctions are called fermions and particles withsymmetric wavefunctions bosons.

Quantum chemistry is mainly interested in elec-trons. Electrons have a position in space and an internalproperty called spin that in many respects behaves likean angular momentum. The spin � of an electron can


S

attain the two values � D ˙1=2. The configurationspace of an electron is therefore not the R

3 but thecartesian product

˝ D R3 � f�1=2; C1=2g:

The spaceL2.˝/ consists of the functions W ˝ ! C

with square integrable components x ! .x; �/, � D˙1=2, and is equipped with the inner product

. ; �/ DX

�D˙1=2

Z .x; �/ �.x; �/ dx:

A system of N electrons is correspondingly describedby wavefunctions

W .R3/N � f�1=2; 1=2gN ! C (13)

with square integrable components x ! .x; �/,where x 2 R

3N and � is a vector consisting ofN spins�i D ˙1=2. These wavefunctions are equipped withthe inner product

. ; �/ DX�

Z .x; �/ �.x; �/ dx;

where the sum now runs over the 2N possible spinvectors � .

Electrons are fermions, as all particles with half-integer spin. That is, the wavefunctions change theirsign under a simultaneous exchange of the positions xiand xj and the spins �i and �j of electrons i ¤ j .They are, in other words, antisymmetric in the sensethat

.Px;P�/ D sign.P / .x; �/

holds for arbitrary simultaneous permutations x!Px

and � ! P� of the electron positions and spins. Thisis a general version of the Pauli principle, a principlethat is of fundamental importance for the physics ofatoms and molecules. The Pauli principle has stunningconsequences. It entangles the electrons with eachother, without the presence of any direct interactionforce. A wavefunction (13) describing such a systemvanishes at points .x; �/ at which xi D xj and �i D �jfor indices i ¤ j . This means that two electronswith the same spin cannot meet at the same place, apurely quantum-mechanical repulsion effect that hasno counterpart in classical physics.

The Molecular Schrodinger EquationNeglecting spin, the system Hilbert space of an atom ormolecule consisting of N particles (electrons and nu-clei) is the space L2.R3/N D L2.R

3N /. The Hamiltonoperator

H D �NXiD1

1

2mi

�i C 1

2

NXi;jD1i¤j

QiQj

jxi � xj j ; (14)

written down here in dimensionless form, is derivedvia the correspondence principle from its counterpartin classical physics, the Hamilton function

H.p; q/ D �NXiD1

1

2mi

jpi j2 C 1

2

NXi;jD1i¤j

QiQj

jqi � qj j

or total energy of a system of point-like particles in thepotential field

V.q/ D 1

2

NXi;jD1i¤j

QiQj

jqi � qj j :

The mi are the masses of the particles in multiplesof the electron mass and the Qi the charges of theparticles in multiples of the electron charge. As hasfirst been shown by Kato [7], the Hamilton operator(14) can be uniquely extended from the space of the in-finitely differentiable functions with bounded supportto a selfadjoint operator H from its domain of defini-tion D.H/ � L2.R

3N / to L2.R3N /. It fits thereforeinto the abstract framework of quantum mechanicssketched above. The domainD.H/ of the extended op-erator is the Sobolev space H2 consisting of the twiceweakly differentiable functions with first and secondorder weak derivatives in L2, respectively a subspaceof this Sobolev space consisting of components of thefull, spin-dependent wavefunctions in accordance withthe Pauli principle if spin is taken into account. Theresulting Schrodinger equation

i@

@tD H

is an extremely complicated object, because of thehigh dimensionality of the problem but also because of


the oscillatory character of its solutions and the manydifferent time scales on which they vary and which canrange over many orders of magnitude. Comprehensivesurvey articles on the properties of atomic and molec-ular Schrodinger operators are Hunziker and Sigal [6]and Simon [9].

Following Born and Oppenheimer [2], the fullproblem is usually split into the electronic Schrodingerequation describing the motion of the electrons inthe field of given clamped nuclei, and an equationfor the motion of the nuclei in a potential fieldthat is determined by solutions of the electronicequation. The transition from the full Schrodingerequation taking also into account the motion of thenuclei to the electronic Schrodinger equation is amathematically very subtle problem; see [10] andthe literature cited therein or the article of Hagedorn(�Born–Oppenheimer Approximation, AdiabaticLimit, and Related Math. Issues) for more information.The intuitive idea behind this splitting is that the elec-trons move much more rapidly than the much heaviernuclei and almost instantaneously follow their motion.Most of quantum chemistry is devoted to the solutionof the stationary electronic Schrodinger equation,the eigenvalue problem for the electronic Hamiltonoperator

H D � 1

2

NXiD1

�i C V0.x/ C 1

2

NXi;jD1i¤j

1

jxi � xj j

again written down in dimensionless form, where

V0.x/ D �NXiD1

KX�D1

Z�

jxi � a� j

is the nuclear potential. It acts on functions with ar-guments x1; : : : ; xN in R

3, which are associated withthe positions of the considered electrons. The a� arethe now fixed positions of the nuclei and the valuesZ�the charges of the nuclei in multiples of the electroncharge. The equation has still to be supplementedby the symmetry constraints arising from the Pauliprinciple.

The spectrum of the electronic Schrodinger operatoris bounded from below. Its essential spectrum is, by the

Hunziker-van Winter-Zhislin theorem, a semi-infiniteinterval; see [4] for details. Of interest for chemistryare configurations of electrons and nuclei for whichthe minimum of the total spectrum is an isolatedeigenvalue of finite multiplicity, the ground state en-ergy of the system. The assigned eigenfunctions, theground states, as well as all other eigenfunctions foreigenvalues below the essential spectrum decay thenexponentially. That means that the nuclei can bind allelectrons. More information on the mathematical prop-erties of these eigenfunctions can be found in �ExactWavefunctions Properties. Chemists are mainly inter-ested in the ground states. The position of the nucleiis then determined minimizing the ground state energyas function of their positions, a process that treatsthe nuclei as classical objects. It is called geometryoptimization.

The Born-Oppenheimer approximation is only afirst step toward the computationally feasible modelsthat are actually used in quantum chemistry. The his-torically first and most simple of these models is theHartree-Fock model in which the true wavefunctionsare approximated by correspondingly antisymmetrizedtensor products

u.x/ DNYiD1

�i .xi /

of functions �i of the electron positions xi 2 R3. These

orbital functions are then determined via a variationalprinciple. This intuitively very appealing ansatzoften leads to surprisingly accurate results. Quantumchemistry is full of improvements and extensions ofthis basic approach; see the comprehensive monograph[5] for further information. Many entries in thisencyclopedia are devoted to quantum chemical modelsand approximation methods that are derived fromthe Schrodinger equation. We refer in particularto the article (�Hartree–Fock Type Methods) onthe Hartree-Fock method, to the contributions(�Post-Hartree-Fock Methods and Excited StatesModeling) on post-Hartree Fock methods and(�Coupled-Cluster Methods) on the coupled clustermethod, and to the article (�Density FunctionalTheory) on density functional theory. Time-dependentproblems are treated in the contribution (�QuantumTime-Dependent Problems).

http://dx.doi.org/10.1007/978-3-540-70529-1_260

http://dx.doi.org/10.1007/978-3-540-70529-1_233

http://dx.doi.org/10.1007/978-3-540-70529-1_236

http://dx.doi.org/10.1007/978-3-540-70529-1_237

http://dx.doi.org/10.1007/978-3-540-70529-1_246

http://dx.doi.org/10.1007/978-3-540-70529-1_234

http://dx.doi.org/10.1007/978-3-540-70529-1_257

Schrodinger Equation: Computation 1299

S

References

1. Atkins, P., Friedman, R.: Molecular Quantum Mechanics.Oxford University Press, Oxford (1997)

2. Born, M., Oppenheimer, R.: Zur Quantentheorie derMolekeln. Ann. Phys. 84, 457–484 (1927)

3. Dirac, P.: Quantum mechanics of many electron systems.Proc. R. Soc. Lond. A Math. Phys. Eng. Sci. 123, 714–733(1929)

4. Gustafson, S., Sigal, I.: Mathematical Concepts of Quan-tum Mechanics. Springer, Berlin/Heidelberg/New York(2003)

5. Helgaker, T., Jørgensen, P., Olsen, J.: Molecular ElectronicStructure Theory. Wiley, Chichester (2000)

6. Hunziker, W., Sigal, I.: The quantum N-body problem.J. Math. Phys. 41, 3448–3510 (2000)

7. Kato, T.: Fundamental properties of Hamiltonian operatorsof Schrodinger type. Trans. Am. Math. Soc. 70, 195–221(1951)

8. Schrodinger, E.: Quantisierung als Eigenwertproblem. Ann.Phys. 79, 361–376 (1926)

9. Simon, B.: Schrodinger operators in the twentieth century.J. Math. Phys. 41, 3523–3555 (2000)

10. Teufel, S.: Adiabatic Perturbation Theory in Quantum Dy-namics. Lecture Notes in Mathematics, vol. 1821. Springer,Berlin/Heidelberg/New York (2003)

11. Thaller, B.: Visual Quantum Mechanics. Springer, NewYork (2000)

12. Thaller, B.: Advanced Visual Quantum Mechanics.Springer, New York (2004)

13. von Neumann, J.: Mathematische Grundlagen der Quanten-mechanik. Springer, Berlin (1932)

14. Yserentant, H.: Regularity and Approximability of Elec-tronic Wave Functions. Lecture Notes in Mathematics, vol.2000. Springer, Heidelberg/Dordrecht/London/New York(2010)

Schrodinger Equation: Computation

Shi JinDepartment of Mathematics and Institute of NaturalScience, Shanghai Jiao Tong University, Shanghai,ChinaDepartment of Mathematics, University of Wisconsin,Madison, WI, USA

The Schrodinger Equation

The linear Schrodinger equation is a fundamentalquantum mechanics equation that describes thecomplex-valued wave function ˆ.t; x; y/ of moleculesor atoms

i„@tˆ.t; x; y/ D Hˆ.t; x; y/; x 2 RN ; y 2 R

n;

(1)

where the vectors x and y denote the positions of Nnuclei and n electrons, respectively, while „ is thereduced Planck constant. The molecular Hamiltonianoperator H consists of two parts, the kinetic energyoperator of the nuclei and the electronic HamiltonianHe for fixed nucleonic configuration:

H D �NXjD1

„22Mj

xj C He.y; x/;

with,

He.y; x/ D �nX

jD1

„22mj

yj CXj<k

1

jyj � yk j

CXj<k

ZjZk

jxj � xkj �NXkD1

nXjD1

Zj

jxj � ykj :

Here mj denotes mass of the j -th electron, and Mj ,Zj denote mass and charge of the j -th nucleus. Theelectronic Hamiltonian He consists of the kinetic en-ergy of the electrons as well as the interelectronicrepulsion potential, internuclear repulsion potential,and the electronic-nuclear attraction potential.

The Born-Oppenheimer Approximation

The main computational challenge to solve theSchrodinger equation is the high dimensionality ofthe molecular configuration space RNCn. For example,the carbon dioxide molecule CO2 consists of 3 nucleiand 22 electrons; thus one has to solve the fulltime-dependent Schrodinger equation in space R

75,which is a formidable task. The Born-Oppenheimerapproximation [1] is a commonly used approach incomputational chemistry or physics to reduce thedegrees of freedom.

This approximation is based on the mass discrep-ancy between the light electrons, which move fast, thuswill be treated quantum mechanically, and the heavynuclei that move slower and are treated classically.Here one first solves the following time-independentelectronic eigenvalue problems:

1300 Schrodinger Equation: Computation

He.y; x/ k.yI x/ D Ek.x/ k.yI x/; 8x 2 RN ;

k D 1; 2; : : : : (2)

Assuming that the spectrum of He, a self-adjoint op-erator, is discrete with a complete set of orthonormaleigenfunctions f k.yI x/g called the adiabatic basis,over the electronic coordinates for every fixed nucleuscoordinates x, i.e.,

Z 1

�1 �j .yI x/ k.yI x/dy D ıjk;

where ıjk is the Kronecker delta. The electronic eigen-value Ek.x/, called the potential energy surface, de-pends on the positions x of the nuclei.

Next the total wave function ˆ.t; x; y/ is expandedin terms of the eigenfunctions f kg:

ˆ.t; x; y/ DXk

�k.t; x/ k.yI x/: (3)

Assume mj D m, and Mj D M , for all j .We take the atomic units by setting „ D 1;Z D1 and introduce " D p

m=M . Typically " rangesbetween 10�2 and 10�3. Insert ansatz (3) into the time-dependent Schrodinger equation (1), multiply all theterms from the left by �

k .yI x/, and integrate with re-spect to y, then one obtains a set of coupled differentialequations:

i"@

@t�k.t; x/ D

24�

NXjD1

"2

2 xj C Ek.x/

35�k.t; x/

CXl

Ckl�l .t; x/; (4)

where the coupling operator Ckl is important to de-scribe quantum transitions between different potentialenergy surfaces.

As long as the potential energy surfaces fEk.x/g arewell separated, all the coupling operators Ckl are ig-nored, and one obtains a set of decoupled Schrodingerequations:

i"@

@t�k.t; x/ D

24�

NXjD1

"2

2 xj C Ek.x/

35�k.t; x/;

.t; x/ 2 RC � R

N : (5)

Thus the nuclear motion proceeds without the transi-tions between electronic states or energy surfaces. Thisis also referred to as the adiabatic approximation.

There are two components in dealing with a quan-tum calculation. First, one has to solve the eigenvalueproblem (2). Variational methods are usually used [10].However, for large n, this remains an intractable task.Various mean field theories have been developed toreduce the dimension. In particular, the Hartree-FockTheory [9] and the Density Function Theory [8] aimat representing the 3n-dimensional electronic wavefunction into a product of one-particle wave function in3 dimension. These approaches usually yield nonlinearSchrodinger equations.

The Semiclassical Limit

The second component in quantum simulation is tosolve the time-dependent Schrodinger equation (5). Itsnumerical approximation per se is similar to that ofthe parabolic heat equation. Finite difference, finiteelement or, most frequently, spectral methods can beused for the spatial discretization. For the time dis-cretization, one often takes a time splitting of Strongor Trotter type that separates the kinetic energy fromthe potential operators in alternating steps. However,due to the smallness of „ or ", the numerical resolutionof the wave function remains difficult. A classicalmethod to deal with such an oscillatory wave problemis the WKB method, which seeks solution of the form�.t; x/ D A.t; x/eiS.t;x/=„ (in the sequel, we consideronly one energy level in (5), thus omitting the subscriptk and replacing E by V ). If one applies this ansatzin (5), by ignoring the O."/ term, one obtains theeikonal equation for phase S and transport equationfor amplitude A:

@tS C 1

2jrS j2 C V.x/ D 0I (6)

@tAC rS � rAC A

2 S D 0 : (7)

The eikonal equation (6) is a typical Hamilton-Jacobi equation, which develops singularities in S ,usually referred to caustics in the context of geometricoptics. Beyond the singularity, one has to superim-pose the solutions of S , each of which satisfyingthe eikonal equation (6), since the solution becomesmultivalued [6].

Schrodinger Equation: Computation 1301

S

This equation can be solved by the method of char-acteristics, provided that V.x/ is sufficiently smooth.Its characteristic flow is given by the following Hamil-tonian system of ordinary differential equations, whichis Newton’s second law:

dxdt.t; y/ D .t; y/I d

dt.t; y/ D �rxV.x.t; y// :

(8)Another approach to study the semiclassical limit is

the Wigner transform [12]:

w"Œ�"�.x; / WD 1

.2�/n

ZRn

�

x C "

2��

x � "

2��ei��d� ; (9)

which is a convenient tool to study the limit of �.t; x/to obtain the classical Liouville equation:

@tw C � rw � rV.x/ � rw D 0 : (10)

Its characteristic equation is given by the Hamiltoniansystem (8).

Various Potentials

The above Wigner analysis works well if the potentialV is smooth. In applications, E can be discontinuous(corresponding to potential barriers), periodic (for solidmechanics with lattice structure), random (with aninhomogeneous background), or even nonlinear (wherethe equation is a field equation, with applications tooptics and water waves, or a mean field equation asan approximation of the original multiparticle linearSchrodinger equation). Different approaches need to betaken for each of these different cases.• V is discontinuous. Through a potential barrier, the

quantum tunnelling phenomenon occurs and onehas to handle wave transmission and reflections [3].

• V is periodic. The Bloch decomposition is used todecompose the wave field into sub-fields along eachof the Bloch bands, which are the eigenfunctions as-sociated with a perturbed Hamiltonian that includesH plus the periodic potential [13].

• V is random. Depending on the space dimensionand strength of the randomness, the waves can be

localized [2] or diffusive. In the latter case, theWigner transform can be used to study the high-frequency limit [7].

• V is nonlinear. The semiclassical limit (10) failsafter caustic formation. The understanding of thislimit for strong nonlinearities remains a majormathematical challenge. Not much is known exceptin the one-dimensional defocusing nonlinearity case(V D j�j2) [4].In (4), when differentEk intersect, or get close, one

cannot ignore the quantum transitions between differ-ent energy levels. A semiclassical approach, known asthe surface hopping method, was developed by Tully. Itis based on the classical Hamiltonian system (8), witha Monte Carlo procedure to account for the quantumtransitions [11].

For a survey of semiclassical computational meth-ods for the Schrodinger equation, see [5].

References

1. Born, M., Oppenheimer, R.: Zur Quantentheorie derMolekeln. Ann. Phys. 84, 457–484 (1927)

2. Frohlich, J., Spencer, T.: Absence of diffusion in the An-derson tight binding model for large disorder or low energy.Commun. Math. Phys. 88, 465–471 (1983)

3. Griffiths, D.J.: Introduction to Quantum Mechanics, 2ndedn. Prentice Hall, Upper Saddle River (2004)

4. Jin, S., Levermore, C.D., McLaughlin, D.W.: The semiclas-sical limit of the defocusing NLS hierarchy. Commun. PureAppl. Math. 52(5), 613–654 (1999)

5. Jin, S., Markowich P., Sparber, C.: Mathematical and com-putational methods for semiclassical Schrodinger equations.Acta Numer. 20, 211–289 (2011)

6. Maslov, V.P., Fedoriuk M.V.: Semi-Classical Approxima-tion in Quantum Mechanics. D. Reidel, Dordrecht/Hollan(1981)

7. Papanicolaou, G., Ryzhik, L.: Waves and transport. In:Caffarelli, L., Weinan, E. (eds.) Hyperbolic equations andfrequency interactions (Park City, UT, 1995). Amer. Math.Soc. Providence, RI 5, 305–382 (1999)

8. Parr, R.G., Yang W.: Density-Functional Theory of Atomsand Molecules. Oxford University Press, New York (1989)

9. Szabo, A., Ostlund, N.S.: Modern Quantum Chemistry.Dover, Mineola/New York (1996)

10. Thijssen, J.M.: Computational Physics. Cambridge Univer-sity Press, Cambridge (1999)

11. Tully, J.: Molecular dynamics with electronic transitions.J. Chem. Phys. 93, 1061–1071 (1990)

12. Wigner, E.: On the quantum correction for thermodynamicequilibrium. Phys. Rev. 40, 749–759 (1932)

13. Wilcox, C.H.: Theory of Bloch waves. J. Anal. Math. 33,146–167 (1978)

1302 Scientific Computing

Scientific Computing

Hans Petter Langtangen1;2, Ulrich Rude3, and AslakTveito1;21Simula Research Laboratory, Center for BiomedicalComputing, Fornebu, Norway2Department of Informatics, University of Oslo, Oslo,Norway3Department of Computer Science, UniversityErlangen-Nuremberg, Erlangen, Germany

Scientific Computing is about practical methods forsolving mathematical problems. One may argue thatthe field goes back to the invention of Mathemat-ics, but today, the term Scientific Computing usuallymeans application of computers to solve mathematicalproblems. The solution process consists of several keysteps, which form the so-called simulation pipeline:1. Formulation of a mathematical model which de-

scribes the scientific problem and is suited forcomputer-based solution methods and some chosenhardware

2. Construction of algorithms which precisely describethe computational steps in the model

3. Implementation of the algorithms in software4. Verification of the implementation5. Preparation of input data to the model6. Analysis and visualization of output data from the

model7. Validation of the mathematical model, with associ-

ated parameter estimation8. Estimation of the precision (or uncertainty) of the

mathematical model for predictionsIn any of the steps, it might be necessary to go backand change the formulation of the model or previoussteps to make further progress with other items onthe list. When this process of iterative improvementhas reached a satisfactory state, one can perform com-puter simulations of a process in nature, technologicaldevices, or society. In essence, that means using thecomputer as a laboratory to mimic processes in thereal world. Such a lab enables impossible, unethical,or costly real-world experiments, but often the processof developing a computer model gives increased sci-entific insight in itself. The disadvantage of computersimulations is that the quality of the results, or more

precisely the quantitative prediction capabilities of thesimulations, may not be well established.

Relations to Other FieldsA term closely related to Scientific Computing (andthat Wikipedia actually treats as a synonym) is Com-putational Science, which we here define as solving ascientific problem with the aid of techniques from Sci-entific Computing. While Scientific Computing dealswith solution techniques and tools, Computational Sci-ence has a stronger focus on the science, that is, ascientific question and the significance of the answer.In between these focal points, the craft of Scien-tific Computing is fundamental in order to producean answer. Scientific Computing and ComputationalScience are developing into an independent scientificdiscipline; they combine elements from Mathematicsand Computer Science to form the foundations for anew methodology of scientific discovery. DevelopingComputational Science and Scientific Computing mayturn out to be as fundamental to the future progress inscience as was the development of novel Mathematicsin the times of Newton and Euler.

Another closely related term is Numerical Analysis,which is about the “development of practical algo-rithms to obtain approximate solutions of mathematicalproblems and the validation of these solutions throughtheir mathematical analysis.” The development of al-gorithms is central to both Numerical Analysis andScientific Computing, and so is the validation of thecomputed solutions, but Numerical Analysis has aparticular emphasis on mathematical analysis of theaccuracy of approximate algorithms. Some narrowerdefinitions of Scientific Computing would say that itcontains all of Numerical Analysis, but in additionapplies more experimental computing techniques toevaluate the accuracy of the computed results. Scien-tific Computing is not necessarily restricted to approx-imate solution methods, although those are the mostwidely used. Other definitions of Scientific Computingmay include additional points from the list above,up to our definition which is wide and includes allthe key steps in creating predictive computer simula-tions.

The term Numerical Analysis seems to have ap-peared in 1947 when the Institute for Numerical Analy-sis was set up at UCLA with funding from the NationalBureau of Standards in the Office Naval Research.

Scientific Computing 1303

S

A landmark for the term Scientific Computing datesback to 1980 when Gene Golub established SIAMJournal on Scientific Computing. Computational Sci-ence, and Computational Science and Engineering, be-came widely used terms during the late the 1990s. Thebook series Lecture Notes in Computational Scienceand Engineering was initiated in 1995 and publishedfrom 1997 (but the Norwegian University of Scienceand Technology proposed a professor in Computa-tional Science as early as 1993). The ComputationalScience term was further coined by the popular con-ferences SIAM Conference on Computational Scienceand Engineering (from 2000) and the InternationalConference on Computational Science (ICCS, from2001). Many student programs with the same namesappeared at the turn of the century.

In Numerical Analysis the guiding principle is toperform computations based on a strong theoreticalfoundation. This foundation includes a proof that thealgorithm under consideration is computable and howaccurate the computed solution would be. Suppose,for instance, that our aim is to solve a system ofalgebraic equations using an iterative method. If wehave an algorithm providing a sequence of approxima-tions given by fxi g, the basic questions of NumericalAnalysis are (i) (existence) to prove that xiC1 can becomputed provided that xi already is computed and(ii) (convergence) how accurate is the i -th approxi-mation. In general, these two steps can be extremelychallenging. Earlier, the goal of having a solid the-oretical basis for algorithms was frequently realisticsince the computers available did not have sufficientpower to address very complex problems. That situ-ation has changed dramatically in recent years, andwe are now able to use computers to study prob-lems that are way beyond the realm of NumericalAnalysis.

Scientific Computing is the discipline that takesover where a complete theoretical analysis of thealgorithms involved is impossible. Today, ScientificComputing is an indispensable tool in science,and the models, methods, and algorithms underconsiderations are rarely accessible by analyticaltools. This lack of theoretical rigor is often addressedby using extensive, carefully conducted computerexperiments to investigate the quality of computedsolutions. More standardized methods for suchinvestigations are an important integral part ofScientific Computing.

Scientific Computing: Mathematics orComputer Science?Universities around the world are organized in de-partments covering a reasonable portion of science orengineering in a fairly disjoint manner. This organi-zational structure has caused much headache amongstresearchers in Numerical Analysis and Scientific Com-puting because these fields typically would have to findits place either in a Computer Science department or ina Mathematics department, and Scientific Computingand Numerical Analysis belong in part to both thesedisciplines. Heated arguments have taken place aroundthe world, and so far no universal solution has beenprovided. The discussion may, however, be used toillustrate the validity of Sayre’s law (From first lines ofwikipedia.org/wiki/Sayres law: Sayre’s law states, in aformulation quoted by Charles Philip Issawi: “In anydispute the intensity of feeling is inversely proportionalto the value of the issues at stake.” By way of corollary,it adds: “That is why academic politics are so bitter.”)which is often attributed to Henry Kissinger.

Formulation ofMathematical Models

The demand for a mathematical model comes fromthe curiosity or need to answer a scientific question.When the model is finally available for computersimulations, the results very often lead to reformulationof the model or the scientific question. This iterativeprocess is the essence of doing science with aid ofmathematical models.

Although some may claim that formulation of math-ematical models is an activity that belongs to Engi-neering and classical sciences and that the models areprescribed in Scientific Computing, we will argue thatthis is seldom the case. The classical scientific subjects(e.g., Physics, Chemistry, Biology, Statistics, Mathe-matics, Computer Science, Economics) do formulatemathematical models, but the traditional focus targetsmodels suitable for analytical insight. Models suitablefor being run on computers require mathematical for-mulations adapted to the technical steps of the solutionprocess. Therefore, experts on Scientific Computingwill often go back to the model and reformulate it toimprove steps in the solution process. In particular,it is important to formulate models that fit approxi-mation, software, hardware, and parameter estimationconstraints. Such aspects of formulating models have


to a large extent been developed over the last decadesthrough Scientific Computing research and practice.Successful Scientific Computing therefore demands aclose interaction between understanding of the phe-nomenon under interest (often referred to as domainknowledge) and the techniques available in the solutionprocess. Occasionally, intimate knowledge about theapplication and the scientific question enables the useof special properties that can reduce computing time,increase accuracy, or just simplify the model consider-ably.

To illustrate how the steps of computing impactsmodeling, consider flow around a body. In Physics (ortraditional Fluid Dynamics to be more precise), onefrequently restricts the development of a model to whatcan be treated by pen and paper Mathematics, whichin the current example means assumption of stationarylaminar flow, that the body is a sphere, and that thedomain is infinite. When analyzing flow around abody through computer simulations, a time-dependentmodel may be easier to implement and faster to runon modern parallel hardware, even if only a stationarysolution is of interest. In addition, a time-dependentmodel allows the development of instabilities and thewell-known oscillating vortex patterns behind the bodythat occur even if the boundary conditions are station-ary. A difficulty with the discrete model is the need fora finite domain and appropriate boundary conditionsthat do not disturb the flow inside the domain (so-calledArtificial Boundary Conditions). In a discrete model,it is also easy to allow for flexible body geometryand relatively straightforward to include models ofturbulence. What kind of turbulence model to applymight be constrained by implementational difficul-ties, details of the computer architecture, or comput-ing time feasibility. Another aspect that impacts themodeling is the estimation of the parameters in themodel (usually done by solving � Inverse Problems:Numerical Methods). Large errors in such estimatesmay favor a simpler and less accurate model overa more complex one where uncertainty in unknownparameters is greater. The conclusion on which modelto choose depends on many factors and ultimatelyon how one defines and measures the accuracy ofpredictions.

Some common ingredients in mathematical modelsare Integration; Approximation of Functions (Curveand Surface Fitting); optimization of Functions orFunctionals; Matrix Systems; Eigenvalue Problems;

Systems of Nonlinear Equations; graphs and networks;�Numerical Analysis of Ordinary DifferentialEquations; �Computational Partial DifferentialEquations; Integral Equations; Dynamical SystemTheory; stochastic variables, processes, and fields;random walks; and Imaging Techniques. The entrieson �Numerical Analysis, �Computational PartialDifferential Equations, and �Numerical Analysisof Ordinary Differential Equations provide moredetailed overview of these topics and associatedalgorithms.

Discrete Models and Algorithms

Mathematical models may be continuous or discrete.Continuous models can be addressed by symboliccomputing, otherwise (and usually) they must be madediscrete through discretization techniques. Many phys-ical phenomena leave a choice between formulatingthe model as continuous or discrete. For example, ageological material can be viewed as a finite set ofsmall elements in contact (discrete element model) oras a continuous medium with prescribed macroscopicmaterial properties (continuum mechanical model). Inthe former case, one applies a set of rules for howelements interact at a mesoscale level and ends upwith a large system of algebraic equations that mustbe solved, or sometimes one can derive explicit for-mulas for how each element moves during a smalltime interval. Another discrete modeling approach isbased on cellular automata, where physical relationsare described between a fixed grid of cells. The Lat-tice Boltzmann method, for example, uses 2D or 3Dcellular automata to model the dynamics of a fluidon a meso-scopic level. Here the interactions betweenthe states of neighboring cells are derived from theprinciples of statistical mechanics.

With a continuous medium, the model is expressedin terms of partial differential equations (with appro-priate initial and boundary conditions). These equa-tions must be discretized by techniques like � FiniteDifference Methods, � Finite Element Methods, or� Finite Volume Methods, which lead to systems ofalgebraic equations. For a purely elastic medium, onecan formulate mathematically equivalent discrete anddiscretized continuous models, while for complicatedmaterial behavior the two model classes have their prosand cons. Some will prefer a discrete element model

http://dx.doi.org/10.1007/978-3-540-70529-1_373

http://dx.doi.org/10.1007/978-3-540-70529-1_390

http://dx.doi.org/10.1007/978-3-540-70529-1_292

http://dx.doi.org/10.1007/978-3-540-70529-1_285

http://dx.doi.org/10.1007/978-3-540-70529-1_292

http://dx.doi.org/10.1007/978-3-540-70529-1_390

http://dx.doi.org/10.1007/978-3-540-70529-1_414

http://dx.doi.org/10.1007/978-3-540-70529-1_450

http://dx.doi.org/10.1007/978-3-540-70529-1_433


S

because it often has fewer parameters to estimatethan a continuum mechanical constitutive law for thematerial.

Some models are spatially discrete but continuousin time. Examples include �Molecular Dynamics andplanetary systems, while others are purely discrete, likethe network of Facebook users.

The fundamental property of a discrete model, ei-ther originally discrete or a discretized continuousmodel, is its suitability for a computer. It is necessaryto adjust the computational work, which is usuallyclosely related to the accuracy of the discrete model,to fit the given hardware and the acceptable time forcalculating the solution. The choice of discretizationis also often dictated by software considerations. Forexample, one may prefer to discretize a partial dif-ferential equation by a finite difference method ratherthan a finite element method because the former ismuch simpler to implement and thus may lead to moreefficient software and thus eventually more accuratesimulation results.

The entries on �Numerical Analysis, �Compu-tational Partial Differential Equations, �NumericalAnalysis of Ordinary Differential Equations, andImaging present overviews of different discretizationtechniques, and more specialized articles go deeperinto the various methods.

The accuracy of the discretization is normally themost important factor that governs the choice of tech-nique. Discretized continuous models are based onapproximations, and quantifying the accuracy of theseapproximations is a key ingredient in Scientific Com-puting, as well as techniques for assessing their com-putational cost. The field of Numerical Analysis hasdeveloped many mathematical techniques that can helpestablish a priori or a posteriori bounds on the errorsin numerous types of approximations. The former canbound the error by properties of the exact solution,while the latter applies the approximate (i.e., the com-puted) solution in the bound. Since the exact solutionof the mathematical problem remains unknown, pre-dictive models must usually apply a posteriori esti-mates in an iterative fashion to control approximationerrors.

When mathematical expressions for or bounds ofthe errors are not obtainable, one has to resort toexperimental investigations of approximation errors.Popular techniques for this purpose have arisen fromverification methods (see below).

With � Symbolic Computing one can bringadditional power to exact and approximate solutiontechniques based on traditional pen and paperMathematics. One example is perturbation methodswhere the solution is expressed as a power series ofsome dimensionless parameter, and one develops ahierarchy of models for determining the coefficientsin the power series. The procedure originally involveslengthy analytical computations by hand which can beautomated using symbolic computing software such asMathematica, Maple, or SymPy.

When the mathematical details of the chosen dis-cretization are worked out, it remains to organize thosedetails in algorithms. The algorithms are computationalrecipes to bridge the gap between the mathematicaldescription of discrete models and their associatedimplementation in software. Proper documentation ofthe algorithms is extremely important such that othersknow all ingredients of the computer model on whichscientific findings are based. Unfortunately, the detailsof complex models that are routinely used for impor-tant industrial or scientific applications may sometimesbe available only through the actual computer code,which might even be proprietary.

Many of the mathematical subproblems thatarise from a model can be broken into smallerproblems for which there exists efficient algorithmsand implementations. This technique has historicallybeen tremendously effective in Mathematics andPhysics. Also in Scientific Computing one oftensees that the best way of solving a new problemis to create a clever glue between existing buildingblocks.

Implementation

Ideally, a well-formulated set of algorithms shouldeasily translate into computer programs. While thisis true for simple problems, it is not in the generalcase. A large portion of many budgets for ScienceComputing projects goes to software development.With increasingly complicated models, the complex-ity of the computer programs appears to grow evenfaster, because Computer Languages were not designedto easily express complicated mathematical concepts.This is one main reason why writing and maintainingscientific software is challenging.

http://dx.doi.org/10.1007/978-3-540-70529-1_56

http://dx.doi.org/10.1007/978-3-540-70529-1_285

http://dx.doi.org/10.1007/978-3-540-70529-1_292

http://dx.doi.org/10.1007/978-3-540-70529-1_390

http://dx.doi.org/10.1007/978-3-540-70529-1_429


The fundamental challenge to develop correct andefficient software for Scientific Computing is notori-ously underestimated. It has an inherent complexitythat cannot be addressed by automated proceduresalone, but must be acknowledged as an independentscientific and engineering problem. Also, testing ofthe software quickly consumes even more resources.Software Engineering is a central field of ComputerScience which addresses techniques for developing andtesting software systems in general, but has so far hadminor impact on scientific software. We can identifythree reasons. First, the structure of the software isclosely related to the mathematical concepts involved.Second, testing is very demanding since we for mostrelevant applications do not know the answer before-hand. Actually, much scientific software is writtento explore new phenomena where neither qualitativenor quantitative properties of the computed resultsare known. Even when analytical insight is knownabout the solution, the computed results will containunknown discretization errors. The third reason is thatscientific software quickly consumes the largest com-putational resources available and hence employs spe-cial High-Performance Computing (HPC) platforms.These platforms imply that computations must runin parallel on heterogeneous architectures, a fact thatseriously complicates the software development. Therehas traditionally been little willingness to adopt goodSoftware Engineering techniques if they cause any lossof computational performance (which is normally thecase).

In the first decades of Scientific Computing,FORTRAN was the dominating Computer Languagefor implementing algorithms and FORTRAN has still astrong position. Classical FORTRAN (with the dialectsIV, 66, and 77) is ideal for mathematical formulas andheavy array computations, but lacks more sophisticatedfeatures like classes, namespaces, and modules toelegantly express complicated mathematical concepts.Therefore, the much richer C++ language attractedsignificant attention in scientific software projects fromthe mid-1990s. Today, C++ is a dominating languagein new projects, although the recent FORTRAN2003/2008 has many of the features that madeC++ popular. C, C++, and FORTRAN enable theprogrammer to utilize almost the maximum efficiencyof an HPC architecture. On the other hand, much lesscomputationally efficient languages such as MATLAB,Mathematica, and Python have reached considerable

popularity for implementing Scientific Computingalgorithms. The reason is that these languages are morehigh level; that is, they allow humans to write computercode closer to the mathematical concepts than what iseasily achievable with C, C++, and FORTRAN.

Over the last four decades, numerous high-qualitylibraries have been developed, especially for frequentlyoccurring problems from numerical linear algebra, dif-ferential equations, approximation of functions, opti-mization, etc. Development of new software today willusually maximize the utilization of such well-testedlibraries. The result is a heterogeneous software envi-ronment that involves several languages and softwarepackages, often glued together in easy-to-use and easy-to-program applications in MATLAB or Python.

If we want to address complicated mathematicalmodels in Scientific Computing, the software needsto provide the right abstractions to ease the imple-mentation of the mathematical concepts. That is, thestep from the mathematical description to the com-puter code must be minimized under the constraint ofminor performance loss. This constrained optimizationproblem is the great challenge in developing scientificsoftware.

Most classical mathematical methods are serial, bututilization of modern computing platforms requiresalgorithms to run in parallel. The development ofalgorithms for �Parallel Computing is one of the mostsignificant activities in Scientific Computing today.Implementation of parallel algorithms, especially incombination with high-level abstractions for compli-cated mathematical concepts, is an additional researchchallenge. Easy-to-use parallel implementations areneeded if a broad audience of scientists shall effec-tively utilize Modern HPC Architectures such as clus-ters with multi-core and multi-GPU PCs. Fortunately,many of the well-known libraries for, e.g., linear alge-bra and optimization are updated to perform well onmodern hardware.

Often scientific progress is limited by the availablehardware capacity in terms of memory and compu-tational power. Large-scale projects can require ex-pensive resources, where not only the supercomputersper se but also their operational cost become limitingfactors. Here it becomes mandatory that the accuracyprovided by a Scientific Computing methodology isevaluated relative to its cost. Traditional approachesthat just quantify the number of numerical operationsturn often out to be misleading. Worse than that,

http://dx.doi.org/10.1007/978-3-540-70529-1_424


S

theoretical analysis often provides only asymptoticbounds for the error with unspecified constants. Thistranslates to cost assessments with unspecified con-stants that are of only little use for quantifying thereal cost to obtain a simulation result. In ScientificComputing, such mathematical techniques must becombined with more realistic cost predictions to guidethe development of effective simulation methods. Thisimportant research direction comes under names suchas �Hardware-Oriented Numerics for PDE or system-atic performance engineering and is usually based ona combination of rigorous mathematical techniqueswith engineering-like heuristics. Additionally, techno-logical constraints, such as the energy consumptionof supercomputers are increasingly found to becomecritical bottlenecks. This in turn motivates genuinelynew research directions, such as evaluating the numer-ical efficiency of an algorithm in terms of its physicalresource requirements like the energy usage.

Verification

�Verification of scientific software means setting up aseries of tests to bring evidence that the software solvesthe underlying mathematical problems correctly.A fundamental challenge is that the problems arenormally solved approximately with an error thatis quantitatively unknown. Comparison with exactmathematical results will in those cases yield adiscrepancy, but the purpose of verification is toensure that there are no additional nonmathematicaldiscrepancies caused by programming mistakes.

The ideal tests for verification is to have exactsolutions of the discrete problems. The discrepancyof such solutions and those produced by the soft-ware should be limited by (small) roundoff errors dueto Finite Precision Arithmetic in the machine. Thestandard verification test, however, is to use exactsolutions of the mathematical problems to computeobserved errors and check that these behave correctly.More precisely, one develops exact solutions for themathematical problem to be solved, or a closely relatedone, and establishes a theoretical model for the errors.Error bounds from Numerical Analysis will very oftensuggest error models. For each problem one can thenvary discretization parameters to generate a data set oferrors and see if the relation between the errors andthe discretization parameters is as expected from the

error model. This strategy constitutes the perhaps mostimportant verification technique and demonstrates howdependent software testing is on results from Numeri-cal Analysis.

Analytical insight from alternative or approximatemathematical models can in many physical problemsbe used to test that certain aspects of the solution be-have correctly. For example, one may have asymptoticresults for the solution far away from the locationswhere the main physics is generated. Moreover, prin-ciples such as mass, momentum, and energy balancefor the whole system under consideration can also bechecked. These types of tests are not as effective foruncovering software bugs as the tests described above,but add evidence that the software works.

Scientific software needs to compute correct num-bers, but must also run fast. Tests for checking that thecomputational speed has not changed unexpectedly aretherefore an integral part of any test suite. Especiallyon parallel computing platforms, this type of efficiencytests is as important as the correctness tests.

A fundamental requirement of verification proce-dures is that all tests are automated and can at any timebe repeated. Version control systems for keeping trackof different versions of the files in a software packagecan be integrated with automatic testing such that everyregistered change in the software triggers a run of thetest suite, with effective reports to help track down newerrors. It is also easy to roll back to previous versionsof the software that passed all tests. Science papers thatrely heavily on Scientific Computing should ideallypoint to web sites where the version history of thesoftware and the tests are available, preferably alsowith snapshots of the whole computing environmentwhere the simulation results were obtained. Theseelements are important for �Reproducibility: Methodsof the scientific findings.

Preparation of Input Data

With increasingly complicated mathematical models,the preparation of input data for such models has be-come a very resource consuming activity. One exampleis the computation of the drag (fuel consumption) ofa car, which demands a mathematical description ofthe car’s surface and a division of the air space outsidethe car into small hexahedra or tetrahedra. Techniquesof �Geometry Processing can be used to measure

http://dx.doi.org/10.1007/978-3-540-70529-1_312

http://dx.doi.org/10.1007/978-3-540-70529-1_311

http://dx.doi.org/10.1007/978-3-540-70529-1_332

http://dx.doi.org/10.1007/978-3-540-70529-1_432


and mathematically represent the car’s surface, whileMeshing Algorithms are used to populate the air flowdomain with small hexahedra or tetrahedra. An evengreater challenge is met in biomedical or geophysicalcomputing where Segmentation Methods must nor-mally be accompanied by human interpretation whenextracting geometries from noisy images.

A serious difficulty of preparing input data is relatedto lack of knowledge of certain data. This situationrequires special methods for parameter estimation asdescribed below.

Visualization of Output Data

Computer simulations tend to generate large amountsof numbers, which are meaningless to the scientistsunless the numbers are transformed to informativepictures closely related to the scientific investigation athand. This is the goal of �Visualization. The simplestand often most effective type of visualization is todraw curves relating key quantities in the investiga-tion. Frequently, curves give incomplete insight intoprocesses, and one needs to visualize more complexobjects such as big networks or time-dependent, three-dimensional scalar or vector fields. Even if the goal ofsimulating a car’s fuel consumption is a single numberfor the drag force, any physical insight into enhancinggeometric features of the car requires detailed visual-ization of the air velocities, the pressure, and vortexstructures in time and 3D space. Visualization is partlyabout advanced numerical algorithms and partly aboutvisual communication. The importance of effectivevisualization in Scientific Computing can hardly beoverestimated, as it is a key tool in software debugging,scientific investigations, and communication of themain research findings.

Validation and Parameter Estimation

While verification is about checking that the algorithmsand their implementations are done right, validation isabout checking that the mathematical model is rele-vant for predictions. When we create a mathematicalmodel for a particular process in nature, a technolog-ical device, or society, we think of a forward modelin the meaning that the model requires a set of in-put data and can produce a set of output data. The

input data must be known for the output data to becomputed. Very often this is not the case, becausesome input data remains unknown, while some outputdata is known or can be measured. And since welack some input data, we cannot run the forwardmodel. This situation leads to a parameter estimationproblem, also known as a model calibration problem,a parameter identification problem, or an � InverseProblems: Numerical Methods. The idea is to usesome of the known output data to estimate some ofthe lacking input data with aid of the model. For-ward models are for the most part well posed in thesense that small errors in input data are not amplifiedsignificantly in the output. Inverse problems, on theother hand, are normally ill posed: small errors inthe measured output may have severe impact on theprecision of our estimates of input data. Much ofthe methodology research is about reducing the illposedness.

�Validation consists in establishing evidence thatthe computer model really models the real phenomenawe are interested in. The idea is to have a range of testcases, each with some known output, usually measuredin physical experiments, and checking that the modelreproduces the known output. The tests are straightfor-wardly conducted if all input data is known. However,very often some input parameters in the model areunknown, and the typical validation procedure for agiven test case is to tune those parameters in a waythat reproduces the known output. Provided that thetuned parameters are within realistic regions, the modelpasses the validation test in the sense that there existsrelevant input data such that the model predicts theobserved output.

Understanding of the process being simulatedcan effectively guide manual tuning of unknowninput parameters. Alternatively, many differentnumerical methods exist for automatically fitting inputparameters. Most of them are based on constrainedoptimization, where one wants to minimize thesquared distance between predicted and observedoutput with respect to the unknown input parameters,given the constraint that any predicted valuemust obey the model. Numerous specific solutionalgorithms are found in the literature on deterministic� Inverse Problems: Numerical Methods. Usually,the solution of an inverse problems requires a largenumber of solutions of the corresponding forwardproblem.

http://dx.doi.org/10.1007/978-3-540-70529-1_368

http://dx.doi.org/10.1007/978-3-540-70529-1_373

http://dx.doi.org/10.1007/978-3-540-70529-1_310

http://dx.doi.org/10.1007/978-3-540-70529-1_373


S

Many scientific questions immediately lead toinverse problems. Seismic imaging is an examplewhere one aims to estimate the spatial properties of theEarth’s crust using measurements of reflected soundwaves. Mathematically, the unknown properties arespatially varying coefficients in partial differentialequations, and the measurements contain informationabout the solution of the equations. The primarypurpose is to estimate the value of the coefficients,which in a forward model for wave motion constituteinput data that must be known. When the focus is aboutsolving the inverse problem itself, one can often applysimpler forward models (in seismic imaging, e.g.,ordinary differential equations for ray tracing havetraditionally replaced full partial differential equationsfor the wave motion as forward model).

Reliable estimation of parameters requires moreobserved data than unknown input data. Then we cansearch for the best fit of parameters, but there isno unique definition of what “best” is. Furthermore,measured output data is subject to measurement errors.It is therefore fruitful to acknowledge that the solutionof inverse problems has a variability. Control of thisvariability gives parameter estimates with correspond-ing statistical uncertainties. For this purpose, one mayturn to solving stochastic inverse problems. Theseare commonly formulated in a �Bayesian Statistics:Computation where probability densities for the inputparameters are suggested, based on prior knowledge,and the framework updates these probability densitiesby taking the forward model and the data into account.The inserted prior knowledge handles the ill posednessof deterministic inverse problems, but at a much in-creased computational cost.

Uncertainty Quantification

With stochastic parameter estimation we immediatelyface the question: How does the uncertainty in esti-mated parameters propagate through the model? Thatis, what is the uncertainty in the predicted output?Methods from �Uncertainty Quantification: Computa-tion can be used to answer this problem. If parametersare estimated by Bayesian frameworks, we have com-plete probability descriptions. With simpler estimationmethods we may still want to describe uncertaintyin the parameters in terms of assumed probabilitydensities.

The simplest and most general method for un-certainty quantification is Monte Carlo simulation.Large samples of input data are drawn at randomfrom the known probability densities and fed as inputto the model. The forward model is run to computethe output corresponding to each sample. From thesamples of output data, one can compute the aver-age, variance, and other statistical measures of thequantities of interest. One must often use of the or-der 105�107 samples (and hence runs of the forwardmodel) to compute reasonably precise statistics. Muchfaster but less general methods exist. During recentyears, � Polynomial Chaos Expansions have becomepopular. These assume that the mapping from stochas-tic input to selected output quantities is smooth suchthat the mapping can be effectively described by apolynomial expansion with few terms. The expansionmay converge exponentially fast and reduce the num-ber of runs of the forward model by several orders ofmagnitude.

Having validated the model and estimated theuncertainty in the output, we can eventually performpredictive computer simulations and calculate theprecision of the predictions. At this point we havecompleted the final item in our list of key steps in thesimulation pipeline.

We remark that although the stochastic parameterestimation framework naturally turns an originally de-terministic model into a stochastic one, modelers mayearly in the modeling process assume that the detailsof some effect are not precisely known and thereforedescribe the effect as a stochastic quantity. Some inputto the model is then stochastic and the question ishow the statistical variability is propagated through themodel. This basically gives the same computationalproblem as in uncertainty quantification. One exam-ple where stochastic quantities are used directly inthe modeling is environmental forces from wind andwaves on structures. The forces may be described asstochastic space-time fields with statistical parametersthat must be estimated from measurements.

Laboratory and Field Experiments

The workflow in Scientific Computing to establish pre-dictive simulation models is seen to involve knowledgefrom several subjects, clearly Applied and Numeri-cal Mathematics, Statistics, and Computer Science,

http://dx.doi.org/10.1007/978-3-540-70529-1_327

http://dx.doi.org/10.1007/978-3-540-70529-1_343

http://dx.doi.org/10.1007/978-3-540-70529-1_331

1310 Self-Consistent Field (SCF) Algorithms

but the mathematical techniques and software workmust be closely integrated with domain-specific mod-eling knowledge from the field where the scienceproblem originates, say Physics, Mechanics, Geology,Geophysics, Astronomy, Biology, Finance, or Engi-neering disciplines. A less emphasized integration iswith laboratory and field experiments, as ScientificComputing is often applauded to eliminate the needfor experiments. However, we have explained thatpredictive simulations require validation, parameterestimation, and control of the variability of input andoutput data. The computations involved in these taskscannot be carried out without access to carefully con-ducted experiments in the laboratory or the field. Ahope is that an extensive amount of large-scale datasets from experiments can be made openly availableto all computational scientists and thereby acceler-ate the integration of experimental data in ScientificComputing.

Self-Consistent Field (SCF) Algorithms

Eric CancesEcole des Ponts ParisTech – INRIA, Universite ParisEst, CERMICS, Projet MICMAC, Marne-la-Vallee,Paris, France

Definition

Self-consistent field (SCF) algorithms usually referto numerical methods for solving the Hartree-Fock,or the Kohn-Sham equations. By extension, theyalso refer to constrained optimization algorithmsaiming at minimizing the Hartree-Fock, or the Kohn-Sham energy functional, on the set of admissiblestates.

Discretization of the Hartree-Fock Model

As usual in electronic structure calculation, we adoptthe system of atomic units, obtained by setting to1 the values of the reduced Planck constant „, ofthe elementary charge, of the mass of the electron,and of the constant 4��0, where �0 is the dielectricpermittivity of the vacuum.

The Hartree-Fock model reads, for a molecularsystem containing N electrons, as

EHF0 .N / D inf

(EHF.˚/; ˚ D .�1; � � � ; �N /

2 .H1.R3˙//N ;

ZR3˙

�i��j D ıij

); (1)

where R3˙ DR3 � f";#g,

ZR3˙

�i��j D

ZR3˙

�i .x/��j .x/

dx WDX

�2f";#g

ZR3

�i .r; �/ �j .r; �/� dr,

and

EHF.˚/ D 1

2

NXiD1

ZR3˙

jr�i j2 CZR3˙

�˚Vnuc

C1

2

ZR3

ZR3

�˚.r/ �˚.r0/jr � r0j dr dr0

�12

ZR3˙

ZR3˙

j�˚.x; x0/j2jr � r0j dx dx0:

The function Vnuc denotes the nuclear potential. Ifthe molecular system contains M nuclei of chargesz1; � � � ; zM located at positions R1; � � � ;RM , the fol-lowing holds

Vnuc.r/ D �MXkD1

zkjr � Rkj :

The density �˚ and the density matrix �˚ associatedwith ˚ are defined by

�˚.r/ DNXiD1

X�2f";#g

j�i .r; �/j2 and

�˚.x; x0/ DNXiD1

�i .x/�i .x0/�:

The derivation of the Hartree-Fock model fromthe N -body Schrodinger equation is detailed in thecontribution by I. Catto to the present volume.

The Galerkin approximation of the minimizationproblem (1) consists in approachingEHF

0 .N / by

Self-Consistent Field (SCF) Algorithms 1311

S

EHF0 .N;V/ D min

(EHF.˚/; ˚ D .�1; � � � ; �N /

2 VN ;ZR3˙

�i��j D ıij

); (2)

where V � H1.R3˙/ is a finite dimensional approxi-mation space of dimension Nb. Obviously, EHF

0 .N / �EHF0 .N;V/ for any V � H1.R3˙/.In the sequel, we denote by C

m;n the vector spaceof the complex-valued matrices with m lines and ncolumns, and by C

m;mh the vector space of the hermitian

matrices of size m � m. We endow these spaces withthe Frobenius scalar product defined by .A;B/F WDTr .A�B/. Choosing a basis .�1; � � � ; �Nb / of V , andexpanding ˚ D .�1; � � � ; �N / 2 VN as

�i .x/ DNbXiD1

C�i��.x/;

problem (2) also reads

EHF0 .N;V/ D min

˚EHF.CC �/; C 2 C

Nb�N;

C �SC D IN�; (3)

where IN is the identity matrix of rank N and where

EHF.D/ D Tr .hD/C 1

2Tr .G.D/D/:

The entries of the overlap matrix S and of the one-electron Hamiltonian matrix h are given by

S�� WDZR3˙

�� (4)

and

h�� WD 1

2

ZR3˙

r��r��

MXkD1

zk

ZR3˙

��.x/��.x/jr � Rkj dx:

(5)The linear map G 2 L.CNb�Nb

h / is defined by

ŒG.D/�� WDNbX

�;�D1Œ.��j��/� .��j��/� D��;

where .��j��/ is the standard notation for the two-electron integrals

.��j��/ WDZR3˙

ZR3˙

��.x/��.x/��.x0/��.x0/�

jr � r0j dxdx0 :

(6)

We will make use of the symmetry property

Tr .G.D/D0/ D Tr .G.D0/D/: (7)

The formulation (3) is referred to as the molecularorbital formulation of the Hartree-Fock model, bycontrast with the density matrix formulation defined as

EHF0 .N;V/ D min

nEHF.D/; D 2 C

Nb�Nb

h ;

DSD D D; Tr .SD/ D No: (8)

It is easily checked that problems (3) and (8) are equiv-alent, remarking that the map fC 2 C

Nb�N jC �SC DIN g 3 C 7! CC � 2 fD 2 C

Nb�Nb

h jDSD DD;Tr .SD/ D N g is onto. For any ˚ D .�1,� � � ; �N / 2 VN with �i .x/ D PNb

�D1 C�i��.x/, thefollowing holds

�˚.x; x0/ DNbX

�;�D1D��.x/��.x0/� with D D CC �:

We refer to the contribution by Y. Maday for thederivation of a priori and a posteriori estimates on theenergy difference EHF

0 .N;V/ � EHF0 .N /, and on the

distance between the minimizers of (2) and those of(1), for L2 and Sobolev norms.

Most Hartree-Fock calculations are performed inatomic orbital basis sets (see, e.g., [9] for details),and more specifically with Gaussian type orbitals. Thelatter are of the form

��.r; �/ DNgXgD1

P�;g.r � Rk.�//e�˛�;g jr�Rk.�/j2 S�.�/;

(9)where P�;g is a polynomial, ˛�;g > 0, and S� D˛ or ˇ, with ˛.�/ D ı�;" and ˇ.�/ D ı�;#. Themain advantage of Gaussian type orbitals is that all theintegrals in (4)–(6) can be calculated analytically [5].


In order to simplify the notation, we assume in thesequel that the basis .�1; � � � ; �Nb / is orthonormal, or,equivalently, that S D INb . The molecular orbital anddensity matrix formulations of the discretized Hartree-Fock model then read

EHF0 .N;V/ D min

˚EHF.CC �/; C 2 C

�; (10)

C D ˚C 2 C

Nb�N jC �C D IN�;

EHF0 .N;V/ D min

˚EHF.D/; D 2 P

�; (11)

P DnD 2 C

Nb�Nbh jD2 D D;

Tr .D/ D No:

Hartree-Fock-Roothaan-Hall Equations

The matrix

F.D/ WD hCG.D/ (12)

is called the Fock matrix associated with D. It is thegradient of the Hartree-Fock energy functional at D,for the Frobenius scalar product.

It can be proved (see again [9] for details) that if Dis a local minimizer of (11), then

8ˆˆ<ˆˆ:

D DNXiD1

˚i˚�i

F .D/˚i D �i ˚i˚�i ˚j D ıij

�1 � �2 � � � � � �N are the lowestN eigenvalues F.D/.

(13)

The above system is a nonlinear eigenvalue problem:the vectors ˚i are orthonormal eigenvectors of thehermitian matrix F.D/ associated with the lowestN eigenvalues of F.D/, and the matrix F.D/ de-pends on the eigenvectors ˚i through the definitionof D. The first three equations of (13) are called theHartree-Fock-Roothaan-Hall equations. The propertythat �1; � � � ; �N are the lowest N eigenvalues of thehermitian matrix F.D/ is referred to as the Aufbauprinciple. An interesting property of the Hartree-Fockmodel is that, for any local minimizer of (11), thereis a positive gap between the N th and .N C 1/th

eigenvalues of F.D/: � D �NC1 � �N > 0 [2, 9].From a geometrical viewpoint,D D PN

iD1 ˚i˚�i is the

matrix of the orthogonal projector on the vector spacespanned by the lowest N eigenvalues of F.D/. Notethat (13) can be reformulated without any reference tothe molecular orbitals ˚i as follows:

D 2 argmin˚Tr .F.D/D0/; D0 2 P

�; (14)

and that, as � D �NC1 � �N > 0, the right-hand side of(14) is a singleton. This formulation is a consequenceof the following property: for any hermitian matrixF 2 C

Nb�Nbh with eigenvalues �1 � � � � � �Nb and any

orthogonal projector D of the form D D PNiD1 ˚i˚�

i

with ˚�i ˚j D ıij and F˚i D �i˚i , the following

holds 8D0 2 P ,

Tr .FD0/ � Tr .FD/C �NC1 � �N

2kD �D0k2F :

(15)

Roothaan Fixed Point Algorithm

It is very natural to try and solve (13) by means of thefollowing fixed point algorithm, originally introducedby Roothaan in [24]:

8ˆˆ<ˆˆ:

F.DRthk /˚i;kC1 D �i;kC1 ˚i;kC1

˚�i;kC1˚j;kC1 D ıij

�1;kC1 � �2;kC1 � � � � � �N;kC1 are the lowestN eigenvalues F.DRth

k /

DRthkC1 D

NXiD1

˚i;kC1˚�i;kC1;

(16)

which also reads, in view of (15),

DRthkC1 2 argmin

˚Tr .F.DRth

k /D0/; D0 2 P�: (17)

Solving the nonlinear eigenvalue problem (13) thenboils down to solving a sequence of linear eigenvalueproblems.

It was, however, early realized that the above algo-rithm often fails to converge. More precisely, it can beproved that, under the assumption that

infk2N .�NC1;k � �N;k/ > 0; (18)


S

which seems to be always satisfied in practice, thesequence .DRth

k /k2N generated by the Roothaanalgorithm either converges to a solution D of theHartree-Fock-Roothaan-Hall equations satisfying theAufbau principle

kDRthk �Dk �!

k!1 0 with D satisfying (13); (19)

or asymptotically oscillates between two states Deven

and Dodd, none of them being solutions to the Hartree-Fock-Roothaan-Hall equations

kDRth2k �Devenk �!

k!1 0 and kDRth2kC1�Doddk �!

k!1 0:

(20)

The behavior of the Roothaan algorithm can beexplained mathematically, noticing that the sequence.DRth

k /k2N is obtained by minimizing by relaxation thefunctional

E.D;D0/ D Tr .hD/C Tr .hD0/C Tr .G.D/D0/:

Indeed, we deduce from ( 7), (12), and (17) that

DRth1 D argmin

˚Tr .F.DRth

0 /D0/; D0 2 P�

D argmin˚Tr .hDRth

0 /C Tr .hD0/

CTr .G.DRth0 /D0/; D0 2 P

�D argmin

˚E.DRth

0 ;D0/; D0 2 P�;

DRth2 D argmin

˚Tr .F.DRth

1 /D/; D 2 P�

D argmin˚Tr .hD/C Tr .hDRth

1 /

CTr .G.DRth1 /D/; D0 2 P

�D argmin

˚Tr .hD/C Tr .hDRth

1 /

CTr .G.D/DRth1 /; D0 2 P

�D argmin

˚E.D;DRth

1 /; D 2 P�;

and so on, and so forth. Together with (15) and (18),this leads to numerical convergence [6]: kDRth

kC2 �DRthk kF ! 0. The convergence/oscillation properties

(19)/(20) can then be obtained by resorting to theŁojasiewicz inequality [17].

Oscillations of the Roothaan algorithm are calledcharge sloshing in the physics literature. Replacing

E.D;D0/ with the penalized functional E.D;D0/ Cb2kD �D0k2F (b > 0) suppresses the oscillations whenb is large enough, but the resulting algorithm

DbkC1 2 argmin

˚Tr .F.Db

k � bDbk/D

0/; D0 2 P�

often converges toward a critical point of the Hartree-Fock-Roothaan-Hall equations, which does not satisfythe Aufbau principle, and is, therefore, not a localminimizer. This algorithm, introduced in [25], is calledthe level-shifting algorithm. It has been analyzed in[6, 17].

Direct MinimizationMethods

The molecular orbital formulation (10) and the densitymatrix formulation (11) of the discretized Hartree-Fock model are constrained optimization problems.In both cases, the minimization set is a (non-convex)smooth compact manifold. The set C is a manifoldof dimension NNb � 1

2N.N C 1/, called the Stiefel

manifold; the set P of the rank-N orthogonal projec-tors in C

Nb is a manifold of dimension N.Nb � N/,called the Grassmann manifold. We refer to [12] for aninteresting review of optimization methods on Stiefeland Grassmann manifolds.

From a historical viewpoint, the first minimizationmethod for solving the Hartree-Fock-Roothaan-Hallequations, the so-called steepest descent method, wasproposed in [20]. It basically consists in performingone unconstrained gradient step on the function D 7!E.D/ (i.e., eDkC1 D Dk�trE.Dk/ D Dk�tF .Dk/),followed by a “projection” step eDkC1 ! DkC1 2P . The “projection” can be done using McWeeny’spurification, an iterative method consisting in replacingat each stepD with 3D2�2D3. It is easily checked thatif eDkC1 is close enough to P , the purification methodconverges quadratically to the point of P closest toeDkC1 for the Frobenius norm. The steepest descentmethod has the drawback of any basic gradient method:it converges very slowly, and is therefore never used inpractice.

Newton-like algorithms for computing Hartree-Fock ground states appeared in the early 1960s withBacskay quadratic convergent (QC) method [3].Bacskay’s approach was to lift the constraints anduse a standard Newton algorithm for unconstrainedoptimization. The local maps of the manifold P used


in [3] are the following exponential maps: for anyC 2 C

Nb�Nb such that C �SC D INb ,

P D CeAD0e

�AC �; D0 D�IN 0

0 0

;

A D�0 �A�

voAvo 0

; Avo 2 C

.Nb�N/�N�

I

the suffix vo denotes the virtual-occupied off-diagonalblock of the matrix A. Starting from some referencematrixC, Bacskay QC algorithm performs one Newtonstep on the unconstrained optimization problem

min˚EC .Avo/ WD EHF.CeAD0e

�AC �/;

Avo 2 C.Nb�N/�N � ;

and updates the reference matrix C by replacing C

with CeeA, where eA D�0 �eA�

voeAvo 0

, eAvo denoting the

result of the Newton step. Newton methods being veryexpensive in terms of computational costs, various at-tempts have been made to build quasi-Newton versionsof Bacskay QC algorithm (see, for e.g., [10, 13]).

A natural alternative to Bacskay QC is to useNewton-like algorithms for constrained optimizationin order to directly tackle problems (10) or (11)(see, e.g., [26]). Trust-region methods for solvingthe constrained optimization problem (10) have alsobeen developed by Helgaker and co-workers [27],and independently by Martınez and co-workers [14].Recently, gradient flow methods for solving (10) [1]and (11) [17] have been introduced and analyzed froma mathematical viewpoint.

For molecular systems of moderate size, and whenthe Hartree-Fock model is discretized in atomic orbitalbasis sets, direct minimization methods are usuallyless efficient than the methods based on constraintrelaxation or optimal mixing presented in the next twosections.

Lieb Variational Principle and ConstraintRelaxation

We now consider the variational problem

min˚EHF.D/; D 2 eP� ; (21)

eP DnD 2 C

Nb�Nbh ; 0 � D � 1; Tr .D/ D N

o;

where 0 � D � 1 means that 0 � ˚�D˚ � ˚�˚ forall ˚ 2 C

Nb , or equivalently, that all the eigenvaluesofD lay in the range Œ0; 1�. It is easily seen that the seteP is convex. It is in fact the convex hull of the set P.A fundamental remark is that all the local minimizersof (21) are on P [9]. This is the discretized versionof a result by Lieb [18]. It is, therefore, possible tosolve the Hartree-Fock model by relaxing the non-convex constraint D2 D D into the convex constraint0 � D � 1.

The orthogonal projection of a given hermitianmatrix D on eP for the Frobenius scalar product canbe computed by diagonalizing D [8]. The cost of oneiteration of the usual projected gradient algorithm [4]is therefore the same at the cost of one iteration of theRoothaan algorithm.

A more efficient algorithm, the Optimal DampingAlgorithm (ODA), is the following [7]

DkC1 2 argmin

˚Tr .F.eDk/D

0/; D0 2 P�

eDkC1 D argmin˚EHF.eD/; eD 2 SegŒeDk;DkC1�

�;

where SegŒeDk;DkC1� D ˚.1� �/eDk C �DkC1; � 2

Œ0; 1�g denotes the line segment linking eDk and DkC1.As EHF is a second degree polynomial in the densitymatrix, the last step consists in minimizing a quadraticfunction of � on Œ0; 1�, which can be done analytically.The procedure is initialized with eD0 D D0, D0 2 Pbeing the initial guess. The ODA thus generates twosequences of matrices:• The main sequence of density matrices .Dk/k2N 2

PN which is proven to numerically converge to anAufbau solution to the Hartree-Fock-Roothaan-Hallequations [9]

• A secondary sequence .eDk/k�1 of matrices belong-ing to eP

The Hartree-Fock energy is a Lyapunov functional ofODA: it decays at each iteration. This follows from thefact that for all D0 2 P and all � 2 Œ0; 1�,

EHF..1 � �/eDk C �D0/ D EHF.eDk/C �Tr .F.eDk/

.D0 � eDk//C �2

2Tr�G.D0 � eDk/.D

0 � eDk/�:

(22)


S

The “steepest descent” direction, that is, thedensity matrix D for which the slope seDk!D DTr .F.eDk/.D � eDk// is minimum, is preciselyDkC1.

In some sense, ODA is a combination of diagonal-ization and direct minimization. The practical imple-mentation of ODA is detailed in [7], where numericaltests are also reported. The cost of one ODA itera-tion is approximatively the same as for the Roothaanalgorithm. Numerical tests show that ODA is par-ticularly efficient in the early steps of the iterativeprocedure.

Convergence Acceleration

SCF convergence can be accelerated by performing, ateach step of the iterative procedure, a mixing of theprevious iterates:

8<:DkC1 2 argmin

˚Tr .eF kD

0/; D0 2 P�

eF k DkX

jD0cj;kF .Dj /;

kXjD0

cj;k D 1;(23)

where the mixing coefficients cj;k are optimized ac-cording to some criterion. Note that in the Hartree-Fock setting, the mean-field Hamiltonian F.D/ isaffine in D, so that mixing the F.Dj /’s amounts tomixing the Dj ’s:

eF k D F.eDk/ where eDk DkX

jD0cj;kDj :

This is no longer true for Kohn-Sham models.In Pulay’s DIIS algorithm [22], the mixing coeffi-

cients are obtained by solving

min

8<:

��kX

jD1cj ŒF .Dj /;Dj �

��2

F

;

kXjD1

cj D 1

9>=>; :

The commutator ŒF .D/;D� is in fact the gradientof the functional A 7! EHF.eADe�A/ defined on thevector space of the Nb � Nb antihermitian matrices(note that eADe�A 2 P for all D 2 P and A

antihermitian); it vanishes when D is a critical pointof EHF on P .

In the EDIIS algorithm [16], the mixing coeffi-cients are chosen to minimize the Hartree-Fock energyof eDk :

min

8<:EHF

0@ kXjD1

cjDj

1A ; cj � 0;

kXjD1

cj D 1

9=;

(note that as the cj ’s are chosen non-negative, eDk isthe element of eP which minimizes the Hartree-Fockenergy on the convex hull of fD0;D1; � � � ;Dkg).

The DIIS algorithm does not always converge. Onthe other hand, when it converges, it is extremely fast.This nice feature of the DIIS algorithm has not yet beenfully explained by rigorous mathematical arguments(see however [23] for a numerical analysis of DIIS-typealgorithms in an unconstrained setting).

SCF Algorithms for Kohn-ShamModels

After discretization in a finite basis set, the Kohn-Shamenergy functional reads

EKS.D/ D Tr .hD/C 1

2Tr .J.D/D/C Exc.D/;

where ŒJ.D/�� WD P��.��j��/D�� is the Coulomb

operator, and where Exc is the exchange-correlationenergy functional [11]. In the standard Kohn-Shammodel [15], EKS is minimized on P , while in theextended Kohn-Sham model [11], EKS is minimizedon the convex set eP . The algorithms presented in theprevious sections can be transposed mutatis mutandisto the Kohn-Sham setting, but as Exc.D/ is not a sec-ond order polynomial in D, the mathematical analysisis more complicated. In particular, no rigorous resulton the Roothaan algorithm for Kohn-Sham has beenpublished so far.

Note that the equality F.P

i ciDi / D Pi ciF .Di /

wheneverP

i ci D 1 is true for Hartree-Fock withF.D/ D rEHF.D/ D h C G.D/, but not for Kohn-Sham with F.D/ D rEKS.D/ D h C J.D/ CrExc.D/. Consequently, in contrast with the situationencountered in the Hartree-Fock framework, mixingdensity matrices and mixing Kohn-Sham Hamiltoniansare not equivalent procedures. This leads to a variety


of acceleration schemes for Kohn-Sham that boil downto either DIIS or EDIIS in the Hartree-Fock setting.For the sake of brevity, and also because the situationis evolving fast (several new algorithms are proposedevery year, and identifying the best algorithms is amatter of debate), we will not present these schemeshere.

Let us finally mention that, if iterative methodsbased on repeated diagonalization of the mean-fieldHamiltonian, combined with mixing procedures, aremore efficient than direct minimization methods formoderate size molecular systems, and when the Kohn-Sham problem is discretized in atomic orbital basissets, the situation may be different for very large sys-tems, or when finite element or planewave discretiza-tion methods are used (see, e.g., [19,21] and referencestherein).

Cross-References

�A Priori and a Posteriori Error Analysis in Chemistry�Density Functional Theory�Hartree–Fock Type Methods�Linear Scaling Methods�Numerical Analysis of Eigenproblems for Electronic

Structure Calculations

References

1. Alouges, F., Audouze, C.: Preconditioned gradient flowsand applications to the Hartree-Fock functional. Numer.Methods PDE 25, 380–400 (2009)

2. Bach, V., Lieb, E.H., Loss, M., Solovej, J.P.: There are nounfilled shells in unrestricted Hartree-Fock theory. Phys.Rev. Lett. 72(19), 2981–2983 (1994)

3. Bacskay, G.B.: A quadratically convergent Hartree-Fock(QC-SCF) method. Application closed shell. Syst. Chem.Phys. 61, 385–404 (1961)

4. Bonnans, J.F., Gilbert, J.C., Lemarechal, C.,Sagastizabal, C.: Numerical Optimization. Theoreticaland Practical Aspects. Springer, Berlin/New York (2006)

5. Boys, S.F.: Electronic wavefunctions. I. A general methodof calculation for the stationary states of any molecularsystem. Proc. R. Soc. A 200, 542–554 (1950)

6. Cances, E., Le Bris, C.: On the convergence of SCF algo-rithms for the Hartree-Fock equations. M2AN Math. Model.Numer. Anal. 34, 749–774 (2000)

7. Cances, E., Le Bris, C.: Can we outperform the DIIS ap-proach for electronic structure calculations? Int. J. QuantumChem. 79, 82–90 (2000)

8. Cances, E., Pernal, K.: Projected gradient algorithms forHartree-Fock and density-matrix functional theory. J. Chem.Phys. 128, 134108 (2008)

9. Cances, E., Defranceschi, M., Kutzelnigg, W., Le Bris, C.,Maday, Y.: Computational quantum chemistry: A primer.In: Handbook of Numerical Analysis, vol. X, pp. 3–270.North-Holland, Amsterdam (2003)

10. Chaban, G., Schmidt, M.W., Gordon, M.S.: Approximatesecond order method for orbital optimization of SCF andMCSCF wavefunctions. Theor. Chem. Acc. 97, 88–95(1997)

11. Dreizler, R.M., Gross, E.K.U.: Density Functional Theory.Springer, Berlin/New York (1990)

12. Edelman, A., Arias, T.A., Smith, S.T.: The geometry ofalgorithms with orthonormality constraints. SIAM J. MatrixAnal. Appl. 20, 303–353 (1998)

13. Fischer, T.H., Almlof, J.: General methods for geometry andwave function optimization. J. Phys. Chem. 96, 9768–9774(1992)

14. Francisco, J., Martınez, J.M., Martınez, L.: Globally conver-gent trust-region methods for self-consistent field electronicstructure calculations, J. Chem. Phys. 121, 10863–10878(2004)

15. Kohn, K., Sham, L.J.: Self-consistent equations includingexchange and correlation effects. Phys. Rev. 140, A1133(1965)

16. Kudin, K., Scuseria, G.E., Cances, E.: A black-box self-consistent field convergence algorithm: one step closer,J. Chem. Phys. 116, 8255–8261 (2002)

17. Levitt, A.: Convergence of gradient-based algorithms for theHartree-Fock equations, preprint (2011)

18. Lieb, E.H.: Variational principle for many-fermion systems.Phys. Rev. Lett. 46, 457–459 (1981)

19. Marks, L.D., Luke, D.R.: Robust mixing for ab initioquantum mechanical calculations. Phys. Rev. B 78, 075114(2008)

20. McWeeny, R.: The density matrix in self-consistent fieldtheory. I. Iterative construction of the density matrix. Proc.R. Soc. Lond. A 235, 496–509 (1956)

21. Mostofi, A.A., Haynes, P.D., Skylaris, C.K., Payne, M.C.:Preconditioned iterative minimization for linear-scalingelectronic structure calculations. J. Chem. Phys. 119,8842–8848 (2003)

22. Pulay, P.: Improved SCF convergence acceleration. J. Com-put. Chem. 3, 556–560 (1982)

23. Rohwedder, T., Schneider, R.: An analysis for the DIIS ac-celeration method used in quantum chemistry calculations.J. Math. Chem. 49, 1889–1914 (2011)

24. Roothaan, C.C.J.: New developments in molecular orbitaltheory. Rev. Mod. Phys. 23, 69–89 (1951)

25. Saunders, V.R., Hillier, I.H.: A “level-shifting” method forconverging closed shell Hartree-Fock wavefunctions. Int. J.Quantum Chem. 7, 699–705 (1973)

26. Shepard, R.: Elimination of the diagonalization bottleneckin parallel direct-SCF methods. Theor. Chim. Acta 84,343–351 (1993)

27. Thøgersen, L., Olsen, J., Yeager, D., Jørgensen, P., Sałek,P., Helgaker, T.: The trust-region self-consistent fieldmethod: towards a black-box optimization in Hartree-Fock and Kohn-Sham theories. J. Chem. Phys. 121, 16–27(2004)

http://dx.doi.org/10.1007/978-3-540-70529-1_255

http://dx.doi.org/10.1007/978-3-540-70529-1_234

http://dx.doi.org/10.1007/978-3-540-70529-1_236

http://dx.doi.org/10.1007/978-3-540-70529-1_252

http://dx.doi.org/10.1007/978-3-540-70529-1_258

Semiconductor Device Problems 1317

S

Semiconductor Device Problems

Ansgar JungelInstitut fur Analysis und Scientific Computing,Technische Universitat Wien, Wien, Austria


82D37; 35Q20; 35Q40; 35Q79; 76Y05

Definition

Highly integrated electric circuits in computer pro-cessors mainly consist of semiconductor transistorswhich amplify and switch electronic signals. Roughlyspeaking, a semiconductor is a crystalline solid whoseconductivity is intermediate between an insulator anda conductor. The modeling and simulation of semicon-ductor transistors and other devices is of paramountimportance in the microelectronics industry to reducethe development cost and time. A semiconductor de-vice problem is defined by the process of derivingphysically accurate but computationally feasible modelequations and of constructing efficient numerical algo-rithms for the solution of these equations. Dependingon the device structure, size, and operating conditions,the main transport phenomena may be very different,caused by diffusion, drift, scattering, or quantum ef-fects. This leads to a variety of model equations de-signed for a particular situation or a particular device.Furthermore, often not all available physical informa-tion is necessary, and simpler models are needed, help-ing to reduce the computational cost in the numericalsimulation. One may distinguish four model classes:microscopic/mesoscopic and macroscopic semiclassi-cal models and microscopic/mesoscopic and macro-scopic quantum models (see Fig. 1).

Description

In the following, we detail only some models from thefour model classes since the field of semiconductordevice problems became extremely large in recentyears. For instance, we ignore compact models, hybridmodel approaches, lattice heat equations, transport in

subbands and magnetic fields, spintronics, and modelsfor carbon nanotube, graphene, and polymer thin-filmmaterials. For technological aspects, we refer to [9].

Microscopic Semiclassical ModelsWe are interested in the evolution of charge carri-ers moving in an electric field. Their motion can bemodeled by Newton’s law. However, in view of thehuge number of electrons involved, the solution ofthe Newton equations is computationally too expen-sive. Moreover, we are not interested in the trajectoryof each single particle. Hence, a statistical approachseems to be sufficient, introducing the distributionfunction (or “probability density”) f .x; v; t/ of anelectron ensemble, depending on the position x 2 R

3,velocity v D Px D dx=dt 2 R

3, and time t > 0. ByLiouville’s theorem, the trajectory of f .x.t/; v.t/; t/does not change during time, in the absence of colli-sions, and hence,

0 D df

dtD @tf C Px�rxf CPv�rvf along trajectories;

(1)

where @tf D @f=@t and rxf , rvf are gradients withrespect to x, v, respectively.

Since electrons are quantum particles (and positionand velocity cannot be determined with arbitrary accu-racy), we need to incorporate some quantum mechan-ics. As the solution of the many-particle Schrodingerequation in the whole space is out of reach, we need anapproximate approach. First, by Bloch’s theorem, it issufficient to solve the Schrodinger equation in a semi-conductor lattice cell. Furthermore, the many-particleinteractions are described by an effective Coulombforce. Finally, the properties of the semiconductorcrystal are incorporated by the semiclassical Newtonequations.

More precisely, let p D „k denote the crystalmomentum, where „ is the reduced Planck constantand k is the wave vector. For electrons with lowenergy, the velocity is proportional to the wave vector,Px D „k=m, where m is the electron mass at rest.In the general case, we have to take into account theenergy band structure of the semiconductor crystal (see[4, 7, 8] for details). Newton’s third law is formulatedas Pp D qrxV , where q is the elementary charge andV.x; t/ is the electric potential. Then, using Pv D Pp=mand rk D .m=„/rv, (1) becomes the (mesoscopic)Boltzmann transport equation:

1318 Semiconductor Device Problems

Boltzmann transportequation

Hydrodynamicequations

Drift-diffusionequations

Energy-transportequations

Liouville-vonNeumann equation

Schrödinger equation

Lindblad equation

Wigner-Boltzmannequation

Quantum hydro-dynamic equations

Quantum drift-diffusion equations

Microscopic/mesoscopic semi-classical models Microscopic/mesoscopic quantum models

Macroscopic semi-classical models Macroscopic quantum models

Semi-classical Newton’s equations

statistical approach

moment method moment method +Chapman-Enskog

constanttemperature

moment method

moment method +Chapman-Enskog

collisionless collisional

Quantum energy-transport equations

constanttemperature

Semiconductor Device Problems, Fig. 1 Hierarchy of some semiconductor models mentioned in the text

@tf C „mk � rxf C q

„rxV � rkf

D Q.f /; .x; k/ 2 R3 � R

3; t > 0; (2)

where Q.f / models collisions of electrons withphonons, impurities, or other particles. The momentsof f are interpreted as the particle density n.x; t/,current density J.x; t/, and energy density .ne/.x; t/:

n DZR3

fdk; J D „m

ZR3

kfdk;

ne D „22m

ZR3

jkj2fdk: (3)

In the self-consistent setting, the electric potential Vis computed from the Poisson equation "s�V Dq.n�C.x//, where "s is the semiconductor permittivityand C.x/ models charged background ions (dopingprofile). Since n depends on the distribution functionf , the Boltzmann–Poisson system is nonlinear.

The Boltzmann transport equation is defined overthe six-dimensional phase space (plus time) whosehigh dimensionality makes its numerical solution avery challenging task. One approach is to employ theMonte Carlo method which consists in simulating astochastic process. Drawbacks of the method are thestochastic nature and the huge computational cost. An

alternative is the use of deterministic solvers, for exam-ple, expanding the distribution function with sphericalharmonics [6].

Macroscopic Semiclassical ModelsWhen collisions become dominant in the semiconduc-tor domain, that is, the mean free path (the lengthwhich a particle travels between two consecutive col-lision events) is much smaller than the device size, afluid dynamical approach may be appropriate. Macro-scopic models are derived from (2) by multiplyingthe equation by certain weight functions, that is 1,k, and jkj2=2, and integrating over the wave-vectorspace. Setting all physical constants to one in thefollowing, for notational simplicity, we obtain, usingthe definitions (3), the balance equations:

@tnC divx J DZR3

Q.f /dk; x 2 R3; t > 0; (4)

@tJ C divx

ZR3

k ˝ kfdk � nrxV DZR3

kQ.f /dk;

(5)

@t .ne/C 1

2divx

ZR3

kjkj2fdk � rxV �

J D 1

2

ZR3

jkj2Q.f /dk: (6)

Semiconductor Device Problems 1319

S

The higher-order integrals cannot be expressed in termsof the moments (3), which is called the closure prob-lem. It can be solved by approximating f by theequilibrium distribution f0, which can be justified bya scaling argument and asymptotic analysis. The equi-librium f0 can be determined by maximizing the Boltz-mann entropy under the constraints of given momentsn, nu, and ne [4]. Inserting f0 in (4)–(6) gives explicitexpressions for the higher-order moments, yieldingthe so-called hydrodynamic model. Formally, thereis some similarity with the Euler equations of fluiddynamics, and there has been an extensive discussionin the literature whether electron shock waves in semi-conductors are realistic or not [10].

Diffusion models, which do not exhibit shock solu-tions, can be derived by a Chapman–Enskog expansionaround the equilibrium distribution f0 according tof D f0 C f1, where ˛ > 0 is the Knudsen number(the ratio of the mean free path and the device length)which is assumed to be small compared to one. Thefunction f1 turns out to be the solution of a certain op-erator equation involving the collision operator Q.f /.Depending on the number of given moments, this leadsto the drift-diffusion equations (particle density given):

@tnC divx J D 0; J D �rxnC nrxV;

x 2 R3; t > 0; (7)

or the energy-transport equations (particle and energydensities given)

@tnC divx J D 0; J D �rxnC n

TrxV;

x 2 R3; t > 0; (8)

@t .ne/C divx S C nu � rxV D 0;

S D �32.rx.nT / � nrxV /; (9)

where ne D 32nT , T being the electron temperature,

and S is the heat flux. For the derivation of thesemodels; we have assumed that the equilibrium distri-bution is given by Maxwell–Boltzmann statistics andthat the elastic scattering rate is proportional to thewave vector. More general models can be derived too,see [4, Chap. 6].

The drift-diffusion model gives a good descriptionof the transport in semiconductor devices close toequilibrium but it is not accurate enough for submicron

devices due to, for example, temperature effects, whichcan be modeled by the energy-transport equations.

In the presence of high electric fields, the station-ary equations corresponding to (7)–(9) are convectiondominant. This can be handled by the Scharfetter–Gummel discretization technique. The key idea is toapproximate the current density along each edge ina mesh by a constant, yielding an exponential ap-proximation of the electric potential. This technique isrelated to mixed finite-element and finite-volume meth-ods [2]. Another idea to eliminate the convective termsis to employ (dual) entropy variables. For instance,for the energy-transport equations, the dual entropyvariables are w D .w1;w2/ D ..� � V /=T;�1=T /,where � is the chemical potential, given by n DT 3=2 exp.�=T /. Then (8) and (9) can be formulated asthe system:

@tb.w/� div.D.w; V /rw/ D 0;

where b.w/ D .n; 32nT /> and D.w; V / 2 R

2�2 is asymmetric positive definite diffusion matrix [4] suchthat standard finite-element techniques are applicable.

Microscopic Quantum ModelsThe semiclassical approach is reasonable if the car-riers can be treated as particles. The validity of thisdescription is measured by the de Broglie wavelength�B corresponding to a thermal average carrier. Whenthe electric potential varies rapidly on the scale of�B or when the mean free path is much larger than�B , quantum mechanical models are more appropriate.A general description is possible by the Liouville–vonNeumann equation:

i"@tb� D ŒH;b�� WD Hb� �b�H; t > 0;

for the density matrix operator b�, where i 2 D �1," > 0 is the scaled Planck constant, and H is thequantum mechanical Hamiltonian. The operator b� isassumed to possess a complete orthonormal set ofeigenfunctions . j / and eigenvalues .�j /. The se-quence of Schrodinger equations i"@t j D H j(j 2 N), together with the numbers �j � 0, is calleda mixed-state Schrodinger system with the particledensity n.x; t/ D P1

jD1 �j j j .x; t/j2. In particular,�j can be interpreted as the occupation probability ofthe state j .

1320 Semiconductor Device Problems

The Schrodinger equation describes the evolution ofa quantum state in an active region of a semiconductordevice. It is used when inelastic scattering is suffi-ciently weak such that phase coherence can be assumedand effects such as resonant tunneling and quantumconductance can be observed. Typically, the deviceis connected to an exterior medium through accesszones, which allows for the injection of charge carriers.Instead of solving the Schrodinger equation in thewhole domain (self-consistently coupled to the Poissonequation), one wishes to solve the problem only inthe active region and to prescribe transparent boundaryconditions at the interfaces between the active andaccess zones. Such a situation is referred to as anopen quantum system. The determination of transpar-ent boundary conditions is a delicate issue since ad hocapproaches often lead to spurious oscillations whichdeteriorate the numerical solution [1].

Nonreversible interactions of the charge carrierswith the environment can be modeled by the Lindbladequation:

i"@tb�DŒH;b��CiXk

Lkb�L�

k� 12.L�

kLkb�Cb�L�kLk/

�;

where Lk are the so-called Lindblad operators and L�k

is the adjoint ofLk . In the Fourier picture, this equationcan be formulated as a quantum kinetic equation, the(mesoscopic) Wigner–Boltzmann equation:

@tw C p � rxw C �ŒV �w D Q.w/;

.x; p/ 2 R3 � R

3; t > 0; (10)

where p is the crystal momentum, �ŒV �w is the po-tential operator which is a nonlocal version of the driftterm rxV �rpw [4, Chap. 11], andQ.w/ is the collisionoperator. The Wigner function w D W Œb� �, where Wdenotes the Wigner transform, is essentially the Fouriertransform of the density matrix. A nice feature of theWigner equation is that it is a phase-space description,similar to the semiclassical Boltzmann equation. Itsdrawbacks are that the Wigner function cannot beinterpreted as a probability density, as the Boltzmanndistribution function, and that the Wigner equation hasto be solved in the high dimensional phase space.A remedy is to derive macroscopic models which arediscussed in the following section.

Macroscopic Quantum ModelsMacroscopic models can be derived from the Wigner–Boltzmann equation (10) in a similar manner as fromthe Boltzmann equation (2). The main difference tothe semiclassical approach is the definition of the equi-librium. Maximizing the von Neumann entropy underthe constraints of given moments of a Wigner functionw, the formal solution (if it exists) is given by the so-called quantum MaxwellianMŒw�, which is a nonlocalversion of the semiclassical equilibrium. It was firstsuggested by Degond and Ringhofer and is relatedto the (unconstrained) quantum equilibrium given byWigner in 1932 [3, 5]. We wish to derive momentequations from the Wigner–Boltzmann equation (10)for the particle density n, current density J , and energydensity ne, defined by:

n DZR3

M Œw�dp; J DZR3

pM Œw�dp;

ne D 1

2

ZR3

jpj2M Œw�dp:

Such a program was carried out by Degond et al. [3],using the simple relaxation-type operator Q.w/ DMŒw��w. This leads to a hierarchy of quantum hydro-dynamic and diffusion models which are, in contrast totheir semiclassical counterparts, nonlocal.

When we employ only one moment (the particledensity) and expand the resulting moment model inpowers of " up to order O."4/ (to obtain local equa-tions), we arrive at the quantum drift-diffusion (ordensity-gradient) equations:

@tnC divx J D 0; J D �rxnC nrxV C "2

6nrx

�x

pnpn

�; x 2 R

3; t>0:

This model is employed to simulate the carrier in-version layer near the oxide of a MOSFET (metal-oxide-semiconductor field-effect transistor). The maindifficulty of the numerical discretization is the treat-ment of the highly nonlinear fourth-order quantumcorrection. However, there exist efficient exponentiallyfitted finite-element approximations, see the referencesof Pinnau in [4, Chap. 12].

Formally, the moment equations for the chargecarriers and energy density give the quantum

Shearlets 1321

S

energy-transport model. Since its mathematicalstructure is less clear, we do not discuss this model[4, Chap. 13.2].

Employing all three moments n, nu, ne, the momentequations, expanded up to terms of order O."4/, be-come the quantum hydrodynamic equations:

@tnC divJ D 0; @tJ C divxJ ˝ J

nC P

�

C nrxV D �ZR3

pQ.MŒw�/dp;

@t .ne/ � divx..P C neI/u � q/C rxV �

J D 1

2

ZR3

jpj2Q.MŒw�/dp; x 2 R3; t > 0;

where I is the identity matrix in R3�3, the quantum

stress tensor P and the energy density ne are given by:

P D nT I � "2

12nr2

x logn; ne D 3

2nT

C1

2njuj2 � "2

24n�x logn;

u D J=n is the mean velocity, and q D�."2=24/n.�xu C 2rx divx u/ is the quantum heatflux. When applying a Chapman–Enskog expansionaround the quantum equilibrium, viscous effects areadded, leading to quantum Navier–Stokes equations[5, Chap. 5]. These models are very interesting froma theoretical viewpoint since they exhibit a surprisingnonlinear structure. Simulations of resonant tunnelingdiodes using these models give qualitatively reasonableresults. However, as expected, quantum phenomena areeasily destroyed by the occurring diffusive or viscouseffects.

References

1. Antoine, X., Arnold, A., Besse, C., Ehrhardt, M., Schadle,A.: A review of transparent and artificial boundary con-ditions techniques for linear and nonlinear Schrodingerequations. Commun. Comput. Phys. 4, 729–796 (2008)

2. Brezzi, F., Marini, L., Micheletti, S., Pietra, P., Sacco, R.,Wang, S.: Discretization of semiconductor device problems.In: Schilders, W., ter Maten, W. (eds.) Handbook of Numer-ical Analysis. Numerical Methods in Electromagnetics, vol.13, pp. 317–441. North-Holland, Amsterdam (2005)

3. Degond, P., Gallego, S., Mehats, F., Ringhofer, C.: Quan-tum hydrodynamic and diffusion models derived from theentropy principle. In: Ben Abdallah, N., Frosali, G. (eds.)

Quantum Transport: Modelling, Analysis and Asymptotics.Lecture Notes in Mathematics 1946, pp. 111–168. Springer,Berlin (2009)

4. Jungel, A.: Transport Equations for Semiconductors.Springer, Berlin (2009)

5. Jungel, A.: Dissipative quantum fluid models. Revista Mat.Univ. Parma, to appear (2011)

6. Hong, S.-M., Pham, A.-T., Jungemann, C.: DeterministicSolvers for the Boltzmann Transport Equation. Springer,New York (2011)

7. Lundstrom, M.: Fundamentals of Carrier Transport. Cam-bridge University Press, Cambridge (2000)

8. Markowich, P., Ringhofer, C., Schmeiser, C.: Semiconduc-tor Equations. Springer, Vienn (1990)

9. Mishra, U., Singh, J.: Semiconductor Device Physics andDesign. Springer, Dordrecht (2007)

10. Rudan, M., Gnudi, A., Quade, W.: A generalized approachto the hydrodynamic model of semiconductor equations.In: Baccarani, G. (ed.) Process and Device Modeling forMicroelectronics, pp. 109–154. Elsevier, Amsterdam (1993)

Shearlets

Gitta KutyniokInstitut fur Mathematik, Technische UniversitatBerlin, Berlin, Germany


42C40; 42C15; 65T60

Synonyms

Shearlets; Shearlet system

Short Description

Shearlets are multiscale systems in L2.R2/ whichefficiently encode anisotropic features. They extendthe framework of wavelets and are constructed byparabolic scaling, shearing, and translation appliedto one or very few generating functions. The mainapplication area of shearlets is imaging science, forexample, denoising, edge detection, or inpainting.Extensions of shearlet systems to L2.Rn/, n � 3 arealso available.

1322 Shearlets

Description

Multivariate ProblemsMultivariate problem classes are typically governed byanisotropic features such as singularities concentratedon lower dimensional embedded manifolds. Examplesare edges in images or shock fronts of transport dom-inated equations. Since due to their isotropic naturewavelets are deficient to efficiently encode such func-tions, several directional representation systems wereproposed among which are ridgelets, contourlets, andcurvelets.

Shearlets were introduced in 2006 [10] and areto date the only directional representation systemwhich provides optimally sparse approximationsof anisotropic features while providing a unifiedtreatment of the continuum and digital realm inthe sense of allowing faithful implementations. Oneimportant structural property is their membershipin the class of affine systems, similar to wavelets.A comprehensive presentation of the theory andapplications of shearlets can be found in [16].

Continuous Shearlet SystemsContinuous shearlet systems are generated by applica-tion of parabolic scaling Aa, QAa, a > 0, shearing Ss,s 2 R and translation, where

Aa D�a 0

0 a1=2

�; QAa D

�a1=2 0

0 a

�;

and Ss D�1 s

0 1

�;

to one or very few generating functions. For 2L2.R2/, the associated continuous shearlet system isdefined by

n a;s;t D a� 3

4 .A�1a S

�1s . � � t// W a > 0; s 2 R;

t 2 R2o;

with a determining the scale, s the direction, and t theposition of a shearlet a;s;t . The associated continuousshearlet transform of some function f 2 L2.R2/ is themapping

L2.R2/ 3 f 7! hf; a;s;t i; a > 0; s 2 R; t 2 R2:

The continuous shearlet transform is an isometry, pro-vided that satisfies some weak regularity conditions.

One common class of generating functions are clas-sical shearlets, which are band-limited functions 2L2.R2/ defined by

O ./ D O .1; 2/ D O 1.1/ O 221

�;

where 1 2 L2.R/ is a discrete wavelet, i.e., itsatisfies

Pj2Z j O 1.2�j /j2 D 1 for a.e. 2 R with

O 1 2 C1.R/ and supp O 1 � Œ� 12;� 1

16� [ Œ 1

16; 12�,

and 2 2 L2.R/ is a “bump function” in the sensethat

P1kD�1 j O 2. C k/j2 D 1 for a.e. 2 Œ�1; 1�

with O 2 2 C1.R/ and supp O 2 � Œ�1; 1�. Figure 1illustrates classical shearlets and the tiling of Fourierdomain they provide, which ensures their directionalsensitivity.

From a mathematical standpoint, continuous shear-lets are being generated by a unitary representation ofa particular semi-direct product, the shearlet group [2].However, since those systems and their associatedtransforms do not provide a uniform resolution ofall directions but are biased towards one axis, forapplications cone-adapted continuous shearlet systemswere introduced. For �; ; Q 2 L2.R2/, the cone-adapted continuous shearlet system SHcont .�; ; Q / isdefined by

SHcont .�; ; Q / D ˚cont .�/[ �cont . / [ Q�cont . Q /;

where

˚cont .�/ D f�t D �.� � t/ W t 2 R2g;

�cont . / D f a;s;t D a� 34 .A�1

a S�1s . � � t//

W a 2 .0; 1�; jsj � 1C a1=2; t 2 R2g;

Q�cont . Q / D f Q a;s;t D a� 34 Q . QA�1

a S�Ts . � � t//

W a 2 .0; 1�; jsj � 1C a1=2; t 2 R2g:

The associated transform is defined in a similar man-ner as before. The induced uniform resolution of alldirections by a cone-like partition of Fourier domain isillustrated in Fig. 2.

The high directional selectivity of cone-adaptedcontinuous shearlet systems is reflected in theresult that they precisely resolve wavefront sets of

Shearlets 1323

S

a

b

c

Shearlets, Fig. 1 Classical shearlets: (a) j a;s;t j for exemplary values of a; s, and t . (b) Support of O . (c) Approximate support ofO a;s;t for different values of a and s

distributions f by the decay behavior of jhf; a;s;t ijand jhf; Q a;s;t ij as a ! 0 [15].

Discrete Shearlet SystemsDiscretization of the parameters a; s, and t by a D2�j , j 2 Z, s D �k2�j=2, k 2 Z, and t D A�1

2jS�1k m,

m 2 Z2 leads to the associated discrete systems. For

2 L2.R2/, the discrete shearlet system is defined by

f j;k;m D 234 j .SkA2j � �m/ W j; k 2 Z; m 2 Z

2g;

with j determining the scale, k the direction, andm theposition of a shearlet j;k;m. The associated discreteshearlet transform of some f 2 L2.R2/ is the mapping

L2.R2/ 3 f 7! hf; j;k;mi; j; k 2 Z; m 2 Z2:

Similarly, for �; ; Q 2 L2.R2/, the cone-adapteddiscrete shearlet system SHdisc.�; ; Q / is defined by

SHdisc.�; ; Q / D ˚disc.�/[ �disc. / [ Q�disc. Q /;

where

˚disc.�/ D f�m D �.� �m/ W m 2 Z2g;

�disc. / D f j;k;m D 234 j .SkA2j � �m/

W j � 0; jkj � d2j=2e; m 2 Z2g;

Q�disc. Q / D f Q j;k;m D 234 j Q .STk QA2j � �m/

W j � 0; jkj � d2j=2e; m 2 Z2g:

To allow more flexibility in the denseness of the po-sitioning of shearlets, sometimes the discretizationof the translation parameter t is performed by t DA�12jS�1k diag.c1; c2/m, m 2 Z

2; c1; c2 > 0. A verygeneral discretization approach is by coorbit theorywhich is however only applicable to the non-cone-adapted setting [4].

For classical (band-limited) shearlets as generatingfunctions, both the discrete shearlet system and thecone-adapted discrete shearlet system – the latter onewith a minor adaption at the intersections of the conesand suitable � – form tight frames forL2.R2/. A theoryfor compactly supported (cone-adapted) discrete shear-let systems is also available [14]. For a special class ofseparable generating functions, compactly supportedcone-adapted discrete shearlet systems form a framewith the ratio of frame bounds being approximately 4.

Discrete shearlet systems provide optimally sparseapproximations of anisotropic features. A customarilyemployed model are cartoon-like functions, i.e., com-pactly supported functions in L2.R2/ which are C2

apart from a closed piecewise C2 discontinuity curve.Up to a log-factor, discrete shearlet systems based onclassical shearlets or compactly supported shearletssatisfying some weak regularity conditions provide theoptimal decay rate of the best N -term approximationof cartoon-like functions f [7, 17], i.e.,

kf � fN k22 � C N�2 .logN/3 as N ! 1;

where here fN denotes the N -term shearlet approxi-mation using the N largest coefficients.

1324 Shearlets

1

0.8

0.6

0.4

0.2

01

1

0.5

0.5

0

0

–0.5

–0.5

–1–1

a b

c

Shearlets, Fig. 2 Cone-adapted shearlet system: (a) Partition-

ing into cones. (b) Approximate support of O a;s;t and bQ a;s;t for

different values of a and s. (c) j O a;s;t j for some shearlet andexemplary values of a; s, and t

Fast AlgorithmsThe implementations of the shearlet transform canbe grouped into two categories, namely, in Fourier-based implementations and in implementations in spa-tial domain.

Fourier-based implementations aim to produce thesame frequency tiling as in Fig. 2b typically by em-ploying variants of the Pseudo-Polar transform [5, 21].Spatial domain approaches utilize filters associatedwith the transform which are implemented by a con-volution in the spatial domain. A fast implementa-tion with separable shearlets was introduced in [22],subdivision schemes are the basis of the algorithmicapproach in [19], and a general filter approach wasstudied in [11].

Several of the associated algorithms are provided atwww.ShearLab.org.

Extensions to Higher DimensionsContinuous shearlet systems in higher dimensionshave been introduced in [3]. In many situations, thesesystems inherit the property to resolve wavefrontsets. The theory of discrete shearlet systems andtheir sparse approximation properties have beenintroduced and studied in [20] in dimension 3 with thepossibility to extend the results to higher dimensions,and similar sparse approximation properties werederived.

www.ShearLab.org

Shift-Invariant Approximation 1325

S

ApplicationsShearlets are nowadays used for a variety of applica-tions which require the representation and processingof multivariate data such as imaging sciences. Promi-nent examples are deconvolution [23], denoising [6],edge detection [9], inpainting [13], segmentation [12],and separation [18]. Other application areas are sparsedecompositions of operators such as the Radon opera-tor [1] or Fourier integral operators [8].

References

1. Colonna, F., Easley, G., Guo, K., Labate, D.: Radon trans-form inversion using the shearlet representation. Appl.Comput. Harmon. Anal. 29, 232–250 (2010)

2. Dahlke, S., Kutyniok, G., Maass, P., Sagiv, C., Stark,H.-G., Teschke, G.: The uncertainty principle associatedwith the continuous shearlet transform. Int. J. WaveletsMultiresolut. Inf. Process. 6, 157–181 (2008)

3. Dahlke, S., Steidl, G., Teschke, G.: The continuous shearlettransform in arbitrary space dimensions. J. Fourier Anal.Appl. 16, 340–364 (2010)

4. Dahlke, S., Steidl, G., Teschke, G.: Shearlet coorbit spaces:compactly supported analyzing shearlets, traces and em-beddings. J. Fourier Anal. Appl. 17, 1232–1255 (2011)

5. Easley, G., Labate, D., Lim, W-Q.: Sparse directional im-age representations using the discrete shearlet transform.Appl. Comput. Harmon. Anal. 25, 25–46 (2008)

6. Easley, G., Labate, D., Colonna, F.: Shearlet based totalVariation for denoising. IEEE Trans. Image Process. 18,260–268 (2009)

7. Guo, K., Labate, D.: Optimally sparse multidimensionalrepresentation using shearlets. SIAM J. Math. Anal. 39,298–318 (2007)

8. Guo, K., Labate, D.: Representation of Fourier integral op-erators using shearlets. J. Fourier Anal. Appl. 14, 327–371(2008)

9. Guo, K., Labate, D.: Characterization and analysis of edgesusing the continuous shearlet transform. SIAM J. ImagingSci. 2, 959–986 (2009)

10. Guo, K., Kutyniok, G., Labate, D.: Sparse multidi-mensional representations using anisotropic dilation andshear operators. In: Wavelets and Splines (Athens, 2005),pp. 189–201. Nashboro Press, Nashville (2006)

11. Han, B., Kutyniok, G., Shen, Z.: Adaptive multiresolutionanalysis structures and shearlet systems. SIAM J. Numer.Anal. 49, 1921–1946 (2011)

12. Hauser, S., Steidl, G.: Convex multiclass segmentationwith shearlet regularization. Int. J. Comput. Math. 90,62–81 (2013)

13. King, E.J., Kutyniok, G., Zhuang, X.: Analysis of In-painting via Clustered Sparsity and Microlocal Analysis.J. Math. Imaging Vis. (to appear)

14. Kittipoom, P., Kutyniok, G., Lim, W-Q.: Construction ofcompactly supported shearlet frames. Constr. Approx. 35,21–72 (2012)

15. Kutyniok, G., Labate, D.: Resolution of the wavefront setusing continuous shearlets. Trans. Am. Math. Soc. 361,2719–2754 (2009)

16. Kutyniok, G., Labate, D. (eds.): Shearlets: MultiscaleAnalysis for Multivariate Data. Birkhauser, Boston (2012)

17. Kutyniok, G., Lim, W-Q.: Compactly supported shearletsare optimally sparse. J. Approx. Theory 163, 1564–1589(2011)

18. Kutyniok, G., Lim, W-Q.: Image separation using waveletsand shearlets. In: Curves and Surfaces (Avignon, 2010).Lecture Notes in Computer Science, vol. 6920. Springer,Berlin, Heidelberg (2012)

19. Kutyniok, G., Sauer, T.: Adaptive directional subdivisionschemes and shearlet multiresolution analysis. SIAM J.Math. Anal. 41, 1436–1471 (2009)

20. Kutyniok, G., Lemvig, J., Lim, W-Q.: Optimally sparseapproximations of 3D functions by compactly supportedshearlet frames. SIAM J. Math. Anal. 44, 2962–3017(2012)

21. Kutyniok, G., Shahram, M., Zhuang, X.: ShearLab: arational design of a digital parabolic scaling algorithm.SIAM J. Imaging Sci. 5, 1291–1332 (2012)

22. Lim, W-Q.: The discrete shearlet transform: a new direc-tional transform and compactly supported shearlet frames.IEEE Trans. Image Process. 19, 1166–1180 (2010)

23. Patel, V.M., Easley, G., Healy, D.M.: Shearlet-based de-convolution. IEEE Trans. Image Process. 18, 2673–2685(2009)

Shift-Invariant Approximation

Robert SchabackInstitut fur Numerische und Angewandte Mathematik(NAM), Georg-August-Universitat Gottingen,Gottingen, Germany


41A25; 42C15; 42C25; 42C40

Synonyms

Approximation by integer translates

Short Definition

Shift-invariant approximation deals with functions fon the whole real line, e.g., time series and signals.

1326 Shift-Invariant Approximation

It approximates f by shifted copies of a single gen-erator ', i.e.,

f .x/ Sf;h;'.x/ WDXk2ZZ

ck;h.f /'xh

� k�; x 2 IR:

(1)

The functions '� �h

� k�

for k 2 ZZ span a space thatis shift-invariant wrt. integer multiples of h. Exten-sions [1, 2] allow multiple generators and multivariatefunctions. Shift-invariant approximation uses only asingle scale h, while wavelets use multiple scales andrefinable generators.

Description

Nyquist–Shannon–Whittaker–Kotelnikov samplingprovides the formula

f .x/ DXk2ZZ

f .kh/ sincxh

� k�

for band-limited functions with frequencies inŒ��=h;C�=h�. It is basic in Electrical Engineering forAD/DA conversion of signals after low-pass filtering.Another simple example arises from the hat functionor order 2 B-spline B2.x/ WD 1 � jxj for �1 � x � 1

and 0 elsewhere. Then the “connect-the-dots” formula

f .x/ Xk2ZZ

f .kh/B2

xh

� k�

is a piecewise linear approximation of f by connectingthe values f .kh/ by straight lines. These two examplesarise from a generator ' satisfying the cardinal inter-polation conditions '.k/ D ı0k; k 2 ZZ, and then theright-hand side of the above formulas interpolates f atall integers. If the generator is a higher-order B-splineBm, the approximation

f .x/ Xk2ZZ

f .kh/Bm

xh

� k�

goes back to I.J. Schoenberg and is not interpolatory ingeneral.

So far, these examples of (1) have very specialcoefficients ck;h.f / D f .kh/ arising from samplingthe function f at data locations hZZ. This connectsshift-invariant approximation to sampling theory. Ifthe shifts of the generator are orthonormal in L2.IR/,

the coefficients in (1) should be obtained instead asck;h.f / D .f; '. �

h� k//2 for any f 2 L2.IR/ to

turn the approximation into an optimal L2 projection.Surprisingly, these two approaches coincide for thesinc case.

Analysis of shift-invariant approximation focuseson the error in (1) for various generators ' and for dif-ferent ways of calculating useful coefficients ck;h.f /.Under special technical conditions, e.g., if the gener-ator ' is compactly supported, the Strang–Fix condi-tions [4]

O'.j /.2�k/ D ı0k; k 2 ZZ; 0 � j < m

imply that the error of (1) is O.hm/ for h ! 0 inSobolev space W m

2 .IR/ if the coefficients are given viaL2 projection. This holds for B-spline generators oforderm.

The basic tool for analysis of shift-invariant L2approximation is the bracket product

Œ'; �.!/ WDXk2ZZ

O'.! C 2k�/ O .! C 2k�/; ! 2 IR

which is a 2�-periodic function. It should exist point-wise, be in L2Œ��; �� and satisfy a stability property

0 < A � Œ'; '�.!/ � B; ! 2 IR:

Then the L2 projector for h D 1 has the convenientFourier transform

OSf;1;'.!/ D Œf; '�.!/

Œ'; '�.!/O'.!/; ! 2 IR;

and if Œ'; '�.!/ D 1=2� for all !, the integer shifts'.� � k/ for k 2 ZZ are orthonormal in L2.IR/.

Fundamental results on shift-invariant approxima-tion are in [1, 2], and the survey [3] gives a com-prehensive account of the theory and the historicalbackground.

References

1. de Boor, C., DeVore, R., Ron, A.: Approximation from shift–invariant subspaces of L2.Rd /. Trans. Am. Math. Soc. 341,787–806 (1994)

2. de Boor, C., DeVore, R., Ron, A.: The structure of finitelygenerated shift–invariant spaces in L2.Rd /. J. Funct. Anal.19, 37–78 (1994)

Simulation of Stochastic Differential Equations 1327

S

3. Jetter, K., Plonka, G.: A survey on L2-approximation ordersfrom shift-invariant spaces. In: Multivariate approximationand applications, pp. 73–111. Cambridge University Press,Cambridge (2001)

4. Strang, G., Fix, G.: A Fourier analysis of the finite elementvariational method. In: Geymonat, G. (ed.) ConstructiveAspects of Functional Analysis. C.I.M.E. II Ciclo 1971,pp 793–840 (1973)

Simulation of Stochastic DifferentialEquations

Denis TalayINRIA Sophia Antipolis, Valbonne, France

Synonyms

Numerical mathematics; Stochastic analysis; Stochas-tic numerics

Definition

The development and the mathematical analysis ofstochastic numerical methods to obtain approximatesolutions of deterministic linear and nonlinear par-tial differential equations and to simulate stochasticmodels.

Overview

Owing to powerful computers, one now desires tomodel and simulate more and more complex physi-cal, chemical, biological, and economic phenomena atvarious scales. In this context, stochastic models areintensively used because calibration errors cannot beavoided, physical laws are imperfectly known (as inturbulent fluid mechanics), or no physical law exists(as in finance). One then needs to compute moments ormore complex statistics of the probability distributionsof the stochastic processes involved in the models.A stochastic process is a collection .Xt / of randomvariables indexed by the time variable t .

This is not the only motivation to develop stochasticsimulations. As solutions of a wide family of com-plex deterministic partial differential equations (PDEs)can be represented as expectations of functionals of

stochastic processes, stochastic numerical methods arederived from these representations.

We can distinguish several classes of stochastic nu-merical methods: Monte Carlo methods consist in sim-ulating large numbers of independent paths of a givenstochastic process; stochastic particle methods consistin simulating paths of interacting particles whose em-pirical distribution converges in law to a deterministicmeasure; and ergodic methods consist in simulatingone single path of a given stochastic process up to alarge time horizon. Monte Carlo methods allow oneto approximate statistics of probability distributions ofstochastic models or solutions to linear partial differen-tial equations. Stochastic particle methods approximatesolutions to deterministic nonlinear McKean–VlasovPDEs. Ergodic methods aim to compute statistics ofequilibrium measures of stochastic models or to solveelliptic PDEs. See, e.g., [2].

In all cases, one needs to develop numerical ap-proximation methods for paths of stochastic processes.Most of the stochastic processes used as models orinvolved in stochastic representations of PDEs are ob-tained as solutions to stochastic differential equations

Xt.x/ D x CZ t

0

b.Xs.x// ds CZ t

0

�.Xs.x// dWs;

(1)

where .Wt / is a standard Brownian motion or, moregenerally, a Levy process. Existence and uniqueness ofsolutions, in strong and weak senses, are exhaustivelystudied, e.g., in [10].

Monte CarloMethods for Linear PDEs

Set a.x/ WD �.x/ �.x/t , and consider the parabolicPDE

@u

@t.t; x/ D

dXiD1

bi .x/ @iu.x/C 1

2

dXi;jD1

aij .x/ @iju.x/

(2)

with initial condition u.0; x/ D f .x/. Under varioushypotheses on the coefficients b and � , it holds thatu.t; x/ D Eu0.t; Xt .x//, where Xt.x/ is the solutionto (1).

Let h > 0 be a time discretization step. Let.Gp/ be independent centered Gaussian vectors withunit covariance matrix. Define the Euler scheme byNXh0 .x/ D x and the recursive relation

1328 Simulation of Stochastic Differential Equations

NXh.pC1/h.x/ D NXh

ph.x/C b NXhph.x/ h

C �. NXhph.x//

ph GpC1:

The simulation of this random sequence only requiresthe simulation of independent Gaussian random vari-ables. Given a time horizonMh, independent copies ofthe sequence .Gp; 1 � p � M/ provide independentpaths . NXh;k

ph .x/; 1 � p � M/.The global error of the Monte Carlo method withN

simulations which approximates u.ph; x/ is

Eu0.XMh/� 1

N

NXkD1

u0 NXh;k

Mh

�

D Eu0.XMh/ � Eu0� NXh

Mh

�„ ƒ‚ …

DW�d .h/

C Eu0� NXh

Mh

�� 1

N

NXkD1

u0 NXh;k

Mh

�

„ ƒ‚ …DW�s .h;N /

:

Nonasymptotic variants of the central limit theoremimply that the statistical error �s.h/ satisfies

8M � 1; 9C.M/ > 0; Ej�s.h/j �C.M/pN

for all

0 < h < 1:

Using estimates on the solution to the PDE (2) ob-tained by PDE analysis or stochastic analysis (stochas-tic flows theory, Malliavin calculus), one can provethat the discretization error ed .h/ satisfies the so-calledTalay–Tubaro expansion

ed .h/ D C.T; x/ hCQh.f; T; x/ h2;

where jC.T; x/j C suphjQh.u0; T; x/j depend on b,� , u0, and T . Therefore, Romberg extrapolation tech-niques can be used:

E

(2

N

NXkD1

u0 NXh=2;k

Mh

�� 1

N

NXkD1

u0 NXh;k

Mh

�)DO.h2/:

For surveys of results in this direction and variousextensions, see [8, 14, 17].

The preceding statistical and discretization errorestimates have many applications: computations of

European option prices, moments of solutions to me-chanical systems with random excitations, etc.

When the PDE (2) is posed in a domain D withDirichlet boundary conditions u.t; x/ D g.x/ on @D,then u.t; x/ D Ef .Xt .x// It< C Eg.X.x// It� ,where is the first hitting time of @D by .Xt .x//.An approximation method is obtained by substitutingNXhph^Nh.x/ to X.x/, where Nh is the first hitting

time of @D by the interpolated Euler scheme. For aconvergence rate analysis, see, e.g., [7].

Let n.x/ denote the unit inward normal vector atpoint x on @D. When one adds Neumann boundaryconditions ru.t; x/ � n.x/ D 0 on @D to (2), thenu.t; x/ D Ef .X

]t .x//, where X]:= is the solution to

an SDE with reflection

X]t .x/ Dx C

Z t

0

b.X]s .x// ds C

Z t

0

�.X]s .x// dWs

CZ tn

0

.Xs/dL]s.X/;

where .Lt .X// is the local time of X at the boundary.Then one constructs the reflected Euler scheme in sucha way that the simulation of the local time, whichwould be complex and numerically instable, is avoided.This construction and the corresponding error analysishave been developed in [4].

Local times also appear in SDEs related to PDEswith transmission conditions along the discontinuitymanifolds of the coefficient a.x/ as in the Poisson–Boltzmann equation in molecular dynamics, Darcylaw in fluid mechanics, etc. Specific numerical meth-ods and error analyses were recently developed: see,e.g., [5].

Elliptic PDEs are interpreted by means of solutionsto SDEs integrated from time 0 up to infinity or theirequilibrium measures. Implicit Euler schemes oftenare necessary to get stability: see [12]. An alternativeefficient methods are those with decreasing stepsizesintroduced in [11].

Stochastic Particle Methods for NonlinearPDEs

Consider the following stochastic particle system.The dynamics of the ith particle is as follows:given N independent Brownian motions .W

.i/t /,

Simulation of Stochastic Differential Equations 1329

S

multidimensional coefficients B and S , and McKeaninteraction kernels b and � , the positions X.i/

t solvethe stochastic differential system

dX.i/t DB0@t; X.i/

t ;1N

NXjD1

bX.i/t ; X

.j /t

�1A dt

C S

0@t; X.i/

t ;1N

NXjD1

�X.i/t ; X

.j /t

�1A dW.i/t :

(3)

Note that the processes X.i/t are not independent.

However, the propagation of chaos and nonlinear mar-tingale problems theories developed in a seminal wayby McKean and Sznitman allow one to prove thatthe probability distribution of the particles empiricalmeasure process converges weakly when N goes toinfinity. The limit distribution is concentrated at theprobability law of the process .Xt/ solution to thefollowing stochastic differential equation which is non-linear in McKean’s sense (its coefficients depend on theprobability distribution of the solution):

(dXt D B.t; Xt ;

Rb.Xt ; y/�t .dy//dt C S.t; Xt ;

R�.Xt ; y/�t .dy//dW t ;

�t .dy/ WD probability distribution of Xt :(4)

In addition, the flow of the probability distributions �tsolves the nonlinear McKean–Vlasov–Fokker–Planckequation

d

dt�t D L�

�t�t ; (5)

where, A denoting the matrix S � St , L�� is the formal

adjoint of the differential operator

L� WDXk

Bk.t; x;Rb.x; y/�.dy//@k

C 12

Xj;k

Ajk.t; x;R�.x; y/�.dy//@jk: (6)

From an analytical point of view, the SDEs (4)provide probabilistic interpretations for macroscopicequations which includes, e.g., smoothened versions ofthe Navier–Stokes and Boltzmann equations: see, e.g.,the survey [16] and [13].

From a numerical point of view, whereas the timediscretization of .Xt / does not lead to an algorithmsince �t is unknown, the Euler scheme for the par-ticle system f.X.i/

t /; i D 1; : : : ; N g can be simu-lated: the solution �t to (5) is approximated by theempirical distribution of the simulated particles attime t , the number N of the particles being cho-sen large enough. Compared to the numerical resolu-tion of the McKean–Vlasov–Fokker–Planck equationby deterministic methods, this stochastic numerical

approach is numerically relevant in the cases of smallviscosities. It is also intensively used, for example, inLagrangian stochastic simulations of complex flowsand in molecular dynamics: see, e.g., [9,15]. When thefunctions B , S , b, � are smooth, optimal convergencerates have been obtained for finite time horizons, e.g.,in [1, 3].

Other stochastic representations have been devel-oped for backward SDEs related to quasi-linear PDEsand variational inequalities. See the survey [6].

References

1. Antonelli, F., Kohatsu-Higa, A.: Rate of convergence ofa particle method to the solution of the McKean-Vlasovequation. Ann. Appl. Probab. 12, 423–476 (2002)

2. Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algo-rithms and Analysis. Stochastic Modelling and AppliedProbability Series, vol. 57. Springer, New York (2007)

3. Bossy, M.: Optimal rate of convergence of a stochastic par-ticle method to solutions of 1D viscous scalar conservationlaws. Math. Comput. 73(246), 777–812 (2004)

4. Bossy, M., Gobet, E., Talay, D.: A symmetrized Eulerscheme for an efficient approximation of reflected diffu-sions. J. Appl. Probab. 41(3), 877–889 (2004)

5. Bossy, M., Champagnat, N., Maire, S., Talay D.: Probabilis-tic interpretation and random walk on spheres algorithms forthe Poisson-Boltzmann equation in Molecular Dynamics.ESAIM:M2AN 44(5), 997–1048 (2010)

6. Bouchard, B., Elie, R., Touzi, N.: Discrete-time approx-imation of BSDEs and probabilistic schemes for fully

1330 Singular Perturbation Problems

nonlinear PDEs. In: Advanced Financial Modelling. RadonSeries on Computational and Applied Mathematics, vol. 8,pp. 91–124. Walter de Gruyter, Berlin (2008)

7. Gobet, E., Menozzi, S.: Stopped diffusion processes:boundary corrections and overshoot. Stochastic Process.Appl. 120(2), 130–162 (2010)

8. Graham, C., Talay, D.: Mathematical Foundations ofStochastic Simulations. Vol. I: Stochastic Simulations andMonte Carlo Methods. Stochastic Modelling and Ap-plied Probability Series, vol. 68. Springer, Berlin/New York(2013, in press)

9. Jourdain, B., Lelievre, T., Roux, R.: Existence, uniquenessand convergence of a particle approximation for the Adap-tive Biasing Force process. M2AN 44, 831–865 (2010)

10. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochas-tic Calculus. Graduate Texts in Mathematics, vol. 113.Springer, New York (1991)

11. Lamberton, D., Pages, G.: Recursive computation of the in-variant distribution of a diffusion. Bernoulli 8(23), 367–405(2002)

12. Mattingly, J., Stuart, A., Tretyakov, M.V.: Convergenceof numerical time-averaging and stationary measures viaPoisson equations. SIAM J. Numer. Anal. 48(2), 552–577(2010)

13. Meleard, S.: A trajectorial proof of the vortex method forthe two-dimensional Navier-Stokes equation. Ann. Appl.Probab. 10(4), 1197–1211 (2000)

14. Milstein, G.N., Tretyakov, M.V.: Stochastic Numerics forMathematical Physics. Springer, Berlin/New York (2004)

15. Pope, S.B.: Turbulent Flows. Cambridge University Press,Cambridge/New York (2003)

16. Sznitman, A.-S.: Topics in propagation of chaos. In: Ecoled’Ete de Probabilites de Saint-Flour XIX-1989. LectureNotes Mathematics, vol. 1464. Springer, Berlin/New York(1991)

17. Talay, D.: Probabilistic numerical methods for partialdifferential equations: elements of analysis. In: Talay, D.,Tubaro, L. (eds.) Probabilistic Models for Nonlinear Par-tial Differential Equations. Lecture Notes in Mathematics,vol. 1627, pp. 48–196, Springer, Berlin/New York (1996)

Singular Perturbation Problems

Robert O’MalleyDepartment of Applied Mathematics,University of Washington, Seattle, WA, USA

Regular perturbation methods often succeed in pro-viding approximate solutions to problems involvinga small parameter � by simply seeking solutions asa formal power series (or even a polynomial) in �.When the regular perturbation approach fails to pro-vide a uniformly valid approximation, one encounters

a singular perturbation problem (using the nomencla-ture of Friedrichs and Wasow [4], now universal).

The prototype singular perturbation problemoccurred as Prandtl’s boundary layer theory of 1904,concerning the flow of a fluid of small viscosity pastan object (cf. [15, 20]). Applications have continuedto motivate the subject, which holds independentmathematical interest involving differential equations.Prandtl’s Gottingen lectures from 1931 to 1932considered the model

�y00 C y0 C y D 0

on 0 � x � 1 with prescribed endvalues y.0/ andy.1/ for a small positive parameter � (correspondingphysically to a large Reynolds number). Linearly in-dependent solutions of the differential equation aregiven by

e��.�/x and e��.�/x=�

where �.�/ 1�p1�4�2�

D 1 C O.�/ and �.�/ 1Cp

1�4�2

D 1 � � CO.�2/ as � ! 0. Setting

y.x; �/ D ˛e��.�/x C ˇe��.�/x=�;

we will need y.0/ D ˛ C ˇ and y.1/ D ˛e��.�/ Cˇe��.�/=� . The large decay constant �=� implies that˛ � y.1/e�.�/, so

y.x; �/ � e�.�/.1�x/y.1/C e��.�/x=�.y.0/� e�.�/y.1//

and

y.x; �/ D e.1�x/y.1/Ce�x=�ex.y.0/�ey.1//CO.�/:

The second term decays rapidly from y.0/ � ey.1/ tozero in an O.�/-thick initial layer near x D 0, so thelimiting solution

e1�xy.1/

for x > 0 satisfies the reduced problem

Y 00 C Y0 D 0 with Y0.1/ D y.1/:

Convergence of y.x; �/ at x D 0 is nonuniform unlessy.0/ D ey.1/. Indeed, to all orders �j , the asymptoticsolution for x > 0 is given by the outer expansion

Singular Perturbation Problems 1331

S

Y.x; �/ D e�.�/.1�x/y.1/ �Xj�0

Yj .x/�j

(see Olver [12] for the definition of an asymptoticexpansion). It is supplemented by an initial (boundary)layer correction

x�; ��

e��.�/x=�.y.0/� e�.�/y.1//

�Xj�0

j

x�

��j

where terms j all decay exponentially to zero as thestretched inner variable x=� tends to infinity.

Traditionally, one learns to complement regularouter expansions by local inner expansions in regionsof nonuniform convergence. Asymptotic matchingmethods, generalizing Prandtl’s fluid dynamicalinsights involving inner and outer approximations,then provide higher-order asymptotic solutions (cf.Van Dyke [20], Lagerstrom [11], and I’lin [7], notingthat O’Malley [13] and Vasil’eva et al. [21] providemore efficient direct techniques involving boundarylayer corrections). The Soviet A.N. Tikhonov andAmerican Norman Levinson independently providedmethods, in about 1950, to solve initial value problemsfor the slow-fast vector system

(Px D f .x; y; t; �/

� Py D g.x; y; t; �/

on t � 0 subject to initial values x.0/ and y.0/. As we

might expect, the outer limit

�X0.t/

Y0.t/

�should satisfy

the reduced (differential-algebraic) system

( PX0 D f .X0; Y0; t; 0/; X0.0/ D x.0/

0 D g.X0; Y0; t; 0/

for an attracting root

Y0 D �.X0; t/

of g D 0, along which gy remains a stable matrix.We must expect nonuniform convergence of the fastvariable y near t D 0, unless y.0/ D Y0.0/. Indeed,Tikhonov and Levinson showed that

(x.t; �/ D X0.t/CO.�/ and

y.t; �/ D Y0.t/C �0.t=�/CO.�/

(at least for t finite) where �0./ is the asymptoticallystable solution of the stretched problem

d�0

dDg.x.0/;�0CY0.0/; 0; 0/; �0.0/ D y.0/�Y0.0/:

The theory supports practical numerical methods forintegrating stiff differential equations (cf. [5]). A moreinclusive geometric theory, using normally hyperbolicinvariant manifolds, has more recently been exten-sively used (cf. [3, 9]).

For many linear problems, classical analysis (cf.[2, 6, 23]) suffices. Multiscale methods (cf. [8, 10]),however, apply more generally. Consider, for example,the two-point problem

�y00 C a.x/y0 C b.x; y/ D 0

on 0 � x � 1 when a.x/ > 0 and a and b are smooth.We will seek the solution as � ! 0C when y.0/ andy.1/ are given in the form

y.x; �; �/ �Xj�0

yj .x; �/�j

using the fast variable

� D 1

�

Z x

0

a.s/ds

to provide boundary layer behavior near x D 0.Because

y0 D yx C a.x/

�y�

and

y00 D yxx C 2

�a.x/yx� C a0.x/

�y� C a2.x/

�2y��;

the given equation is converted to the partial differen-tial equation

a2.x/

�@2y

@�2C @y

@�

�C �

�2a.x/

@2y

@x@�C a0.x/

@y

@�

C a.x/@y

@xC b.x; y/

�C �2yxx D 0:

1332 Singular Perturbation Problems

We naturally ask the leading term y0 to satisfy

@2y0

@�2C @y0

@�D 0;

so y0 has the form

y0.x; �/ D A0.x/C B0.x/e��:

The boundary conditions moreover require that

A0.0/C B0.0/ D y.0/ and A0.1/ � y.1/

(since e�� is negligible when x D 1). From the �coefficient, we find that y1 must satisfy

a2.x/

�@2y1

@�2C @y1

@�

�D �2a.x/ @

2y0

@x@�� a0.x/

@y0

@�

�a.x/@y0@x

� b.x; y0/

D �a.x/A00

C �a.x/B 0

0 C a0.x/B0�e��

�b.x;A0 CB0e��/:

Consider the right-hand side as a power series in e��.Undetermined coefficient arguments show that its firsttwo terms (multiplying 1 and e��) will resonate withthe solutions of the homogeneous equation to produceunbounded or secular solutions (multiples of � and�e��) as � ! 1 unless we require that1. A0 satisfies the reduced problem

a.x/A00 C b.x;A0/ D 0; A0.1/ D y.1/

(and continues to exist throughout 0 � x � 1)2. B0 satisfies the linear problem

a.x/B 00 C ��by.x; A0/C a0.x/

�B0 D 0;

B0.0/ D y.0/ � A0.0/:

Thus, we have completely obtained the limiting so-lution Y0.x; �/. We note that the numerical solutionof restricted two-point problems is reported in Rooset al. [18]. Special complications, possibly shock lay-ers, must be expected at turning points where a.x/vanishes (cf. [14, 24]).

Related two-time scale methods have long beenused in celestial mechanics (cf. [17,22]) to solve initialvalue problems for nearly linear oscillators

Ry C y D �f .y; Py/

on t � 0. Regular perturbation methods suffice onbounded t intervals, but for t D O.1=�/ one must seeksolutions

y.t; ; �/ �Xj�0

yj .t; /�j

using the slow time

D �t:

We must expect a boundary layer (i.e., nonuniformconvergence) at t D 1 to account for the cumulativeeffect of the small perturbation �f .

Instead of using two-timing or averaging (cf. [1] or[19]), let us directly seek an asymptotic solution of theinitial value problem for the Rayleigh equation

Ry C y D � Py�1 � 1

3Py2�

in the form

y.t; ; �/ D A.; �/eit C �B.; �/e3it C �2C.; �/e5it

C : : :C complex conjugate

for undetermined slowly varying complex-valued coef-ficients A;B; C; : : : depending on � (cf. [16]). Differen-tiating twice and separating the coefficients of the oddharmonics eit ; e3i t ; e5i t ; : : : in the differential equation,we obtain

2idAd

� iA �1 � jAj2�C �

�d2Ad2

� A2 dA�

d

�dAd

�1 � 2jAj2� � 3i.A�/2B

�C : : : D 0;

�8B � i

3A3 C : : : D 0; and

�24C � 3iA2B C � � � D 0:

The resulting initial value problem

Solid State Physics, Berry Phases and Related Issues 1333

S

dAd

D A2

�.1 � jAj2/C i�

8.jAj4 � 2/C : : :

�

for the amplitude A.; �/ can be readily solved forfinite by using polar coordinates and regular pertur-bation methods on all equations. Thus, we obtain theasymptotic solution

y.t; ; �/ D A.; �/eit � i�

24A3.; �/e3it

C �

64

�A3.; �/.3jA.; �/j2 � 2/e3it

� A5.; �/

3e5it

�C : : :C complex

conjugate

there. We note that the oscillations for related coupledvan der Pol equations are of special current interest inneuroscience.

The reader who consults the literature cited willfind that singular perturbations continue to provideasymptotic solutions to a broad variety of differentialequations from applied mathematics. The underlyingmathematics is also extensive and increasingly sophis-ticated.

References

1. Bogoliubov, N.N., Mitropolski, Y.A.: Asymptotic Methodsin the Theory of Nonlinear Oscillations. Gordon and Breach,New York (1961)

2. Fedoryuk, M.V.: Asymptotic Analysis. Springer, New York(1994)

3. Fenichel, N.: Geometic singular perturbation theory forordinary differential equations. J. Differ. Equ. 15, 77–105(1979)

4. Friedrichs, K.-O., Wasow, W.: Singular perturbations ofnonlinear oscillations. Duke Math. J. 13, 367–381 (1946)

5. Hairer, E., Wanner, G.: Solving Ordinary Differential Equa-tions II. Stiff and Differential-Algebraic Problems, 2ndrevised edn. Springer, Berlin (1996)

6. Hsieh, P.-F., Sibuya Y.: Basic Theory of Ordinary Differen-tial Equations. Springer, New York (1999)

7. Il’in, A.M.: Matching of Asymptotic Expansions of Solu-tions of Boundary Value Problems. American MathematicalSociety, Providence (1992)

8. Johnson, R.S.: Singular Perturbation Theory, Mathematicaland Analytical Techniques with Applications to Engineer-ing. Springer, New York (2005)

9. Kaper, T.J.: An introduction to geometric methods and dy-namical systems theory for singular perturbation problems.

In: Cronin, J., O’Malley, R.E. (eds.) Analyzing Multi-scale Phenomena Using Singular Perturbation Methods,pp. 85–131. American Mathematical Society, Providence(1999)

10. Kevorkian, J., Cole J.D.: Multiple Scale and Singular Per-turbation Methods. Springer, New York (1996)

11. Lagerstrom, P.A.: Matched Asymptotic Expansions.Springer, New York (1988)

12. Olver, F.W.J.: Asymptotics and Special Functions. Aca-demic, New York (1974)

13. O’Malley, R.E., Jr.: Singular Perturbation Methods for Or-dinary Differential Equations. Springer, New York (1991)

14. O’Malley, R.E., Jr.: Singularly perturbed linear two-pointboundary value problems. SIAM Rev. 50, 459–482 (2008)

15. O’Malley, R.E., Jr.: Singular perturbation theory: a viscousflow out of Gottingen. Annu. Rev. Fluid Mech. 42, 1–17(2010)

16. O’Malley, R.E., Jr., Kirkinis, E.: A combined renormaliza-tion group-multiple scale method for singularly perturbedproblems. Stud. Appl. Math. 124, 383–410 (2010)

17. Poincare, H.: Les Methodes Nouvelles de la MecaniqueCeleste II. Gauthier-Villars, Paris (1893)

18. Roos, H.-G., Stynes, M., Tobiska L.: Robust NumericalMethods for Singularly Perturbed Differential Equations,2nd edn. Springer, Berlin (2008)

19. Sanders, J.A., Verhulst, F., Murdock, J.: Averaging Meth-ods in Nonlinear Dynamical Systems, 2nd edn. Springer,New York (2007)

20. Van Dyke, M.: Perturbation Methods in Fluid Dynamics.Academic, New York (1964)

21. Vasil’eva, A.B., Butuzov, V.F., Kalachev L.V.: The Bound-ary Function Method for Singular Perturbation Problems.SIAM, Philadelphia (1995)

22. Verhulst, F.: Nonlinear Differential Equations and Dynami-cal Systems, 2nd edn. Springer, New York (2000)

23. Wasow, W.: Asymptotic Expansions for Ordinary Differen-tial Equations. Wiley, New York (1965)

24. Wasow, W.: Linear Turning Point Theory. Springer,New York (1985)

Solid State Physics, Berry Phases andRelated Issues

Gianluca PanatiDipartimento di Matematica, Universit di Roma “LaSapienza”, Rome, Italy

Synonyms

Dynamics of Bloch electrons; Theory of Bloch bands;Semiclassical model of solid-state physics

1334 Solid State Physics, Berry Phases and Related Issues

Definition/Abstract

Crystalline solids are solids in which the ionic coresof the atoms are arranged periodically. The dynamicsof a test electron in a crystalline solid can be conve-niently analyzed by using the Bloch-Floquet transform,while the localization properties of electrons are betterdescribed by using Wannier functions. The latter canalso be obtained by minimizing a suitable localizationfunctional, yielding a convenient numerical algorithm.

Macroscopic transport properties of electrons incrystalline solids are derived, by using adiabatic the-ory, from the analysis of a perturbed Hamiltonian,which includes the effect of external macroscopic orslowly varying electromagnetic potentials. The geo-metric Berry phase and its curvature play a prominentrole in the corresponding effective dynamics.

The Periodic Hamiltonian

In a crystalline solid, the ionic cores are arrangedperiodically, according to a periodicity lattice � Dn� 2 R

d W � D PdjD1nj �j for some nj 2 Z

o'

Zd ; where f�1; : : : ; �d g are fixed linearly independent

vectors in Rd .

The dynamics of a test electron in the potentialgenerated by the ionic cores of the solid and, in a mean-field approximation, by the remaining electrons isdescribed by the Schrodinger equation i@t D Hper ,where the Hamiltonian operator reads (in Rydbergunits)

Hper D � C V� .x/ acting in L2.Rd /: (1)

Here, D r2 is the Laplace operator and the functionV� W R

d ! R is periodic with respect to � , i.e.,V� .x C �/ D V� .x/ for all � 2 �; x 2 R

d . A math-ematical justification of such a model in the reducedHartree-Fock approximation was obtained in Cattoet al. [3] and Cances et al. [4], see �MathematicalTheory for Quantum Crystals and references therein.

To assure that Hper is self-adjoint in L2.Rd / on theSobolev space W 2;2.Rd /, we make an usual Kato-typeassumption on the � -periodic potential:

V� 2 L2loc.Rd / for d � 3;

V� 2 Lploc.Rd / with p > d=2 for d � 4: (2)

Clearly, the case of a potential with Coulomb-likesingularities is included.

The Bloch–Floquet Transform (BlochRepresentation)Since Hper commutes with the lattice translations, itcan be decomposed as a direct integral of simpleroperators by the (modified) Bloch–Floquet transform.Preliminarily, we define the dual lattice as � � WD˚k 2 R

d W k � � 2 2�Z for all � 2 � � : We denote byY (resp. Y �) the centered fundamental domain of �(resp. � �), namely,

Y � D k 2 R

d W k DXd

jD1k0

j ��

j for k0

j 2��12;1

2

�;

where f��j g is the dual basis to

˚�j�, i.e., ��

j � �i D2�ıj;i . When the opposite faces of Y � are identified,one obtains the torus T�

d WD Rd=� �.

One defines, initially for 2 C0.Rd /, the modifiedBloch–Floquet transform as

. QUBF /.k; y/ WD 1

jY �j 12X�2�

e�ik�.yC�/ .y C �/;

y 2 Rd ; k 2 R

d : (3)

For any fixed k 2 Rd ,� QUBF

�.k; �/ is a � -periodic

function and can thus be regarded as an element ofHf WD L2.TY /, TY being the flat torus R

d=� . Themap defined by (3) extends to a unitary operator QUBF WL2.Rd / �! R ˚

Y � Hf dk; with inverse given by

� QU�1BF '

�.x/ D 1

jY �j 12ZY �

dk eik�x'.k; Œx�/;

where Œ � � refers to the decomposition x D �x C Œx�,with �x 2 � and Œx� 2 Y .

The advantage of this construction is that the trans-formed Hamiltonian is a fibered operator over Y �.Indeed, one checks that

QUBFHper QU�1BF D

Z ˚

Y �

dk Hper.k/

with fiber operator

Hper.k/ D � � iry C k�2 C V� .y/ ; k 2 R

d ; (4)

http://dx.doi.org/10.1007/978-3-540-70529-1_262


S

acting on the k-independent domain W 2;2.TY / �L2.TY /. The latter fact explains why it is mathemat-ically convenient to use the modified BF transform.Each fiber operatorHper.k/ is self-adjoint, has compactresolvent, and thus pure point spectrum accumulatingat infinity. We label the eigenvalue increasingly, i.e.,E0.k/ � E1.k/ � E2.k/ � : : :. With this choice,they are � �-periodic, i.e., En.k C �/ D En.k/ for all� 2 � �. The function k 7! En.k/ is called the nthBloch band.

For fixed k 2 Y �, one considers the eigenvalueproblem

Hper.k/ un.k; y/ D En.k/ un.k; y/;

kun.k; �/kL2.TY / D 1: (5)

A solution to the previous eigenvalue equation (e.g., bynumerical simulations) provides a complete solution tothe dynamical equation induced by (1). Indeed, if theinitial datum 0 satisfies

. QUBF 0/.k; y/D'.k/ un.k; y/ for some ' 2 L2.Y �/;

(one says in jargon that “ 0 is concentrated on thenth band”) then the solution .t/ to the Schrodingerequation with initial datum 0 is characterized by

. QUBF .t//.k; y/ D �e�iEn.k/t'.k/

�un.k; y/:

In particular, the solution is exactly concentrated onthe nth band at any time. By linearity, one recovers thesolution for any initial datum. Below, we will discussto which extent this dynamical description surviveswhen macroscopic perturbations of the operator (1) areconsidered.

Wannier Functions and Charge LocalizationWhile the Bloch representation is a useful tool todeal with dynamical and energetic problems, it is notconvenient to study the localization of electrons insolids. A related crucial problem is the constructionof a basis of generalized eigenfunctions of the oper-ator Hper which are exponentially localized in space.Indeed, such a basis allows to develop computationalmethods which scale linearly with the system size[6], makes possible the description of the dynamicsby tight-binding effective Hamiltonians, and plays a

prominent role in the modern theories of macroscopicpolarization [9, 18] and of orbital magnetization [21].

A convenient system of localized generalized eigen-functions has been proposed by Wannier [22]. By defi-nition, a Bloch function corresponding to the nth Blochband is any u satisfying (5). Clearly, if u is a Blochfunction then Qu, defined by Qu.k; y/ D ei#.k/ u.k; y/ forany � �-periodic function # , is also a Bloch function.The latter invariance is often called Bloch gauge invari-ance.

Definition 1 The Wannier function wn 2 L2.Rd /

corresponding to a Bloch function un for the Blochband En is the preimage of un with respect to theBloch-Floquet transform, namely

wn.x/ WD � QU�1BF un

�.x/D 1

jY �j 12ZY �

dk eik�xun.k; Œx�/:

The translated Wannier functions are

wn;� .x/ WD wn.x � �/

D 1

jY �j 12ZY �

dk e�ik�� eik�xun.k; Œx�/; � 2 �:

Thus, in view of the orthogonality of the trigonometricpolynomials and the fact that QUBF is an isometry,the functions fwn;�g�2� are mutually orthogonal inL2.Rd /. Moreover, the family fwn;�g�2� is a completeorthonormal basis of QU�1

BF RanP�, where P�.k/ is thespectral projection of Hper.k/ corresponding to the

eigenvalueEn.k/ and P� D R ˚Y � P�.k/ dk.

In view of the properties of the Bloch–Floquettransform, the existence of an exponentially localizedWannier function for the Bloch band En is equivalentto the existence of an analytic and � �-pseudoperiodicBloch function (recall that (3) implies that the Blochfunction must satisfy u.k C �; y/ D e�i��y u.k; y/for all � 2 � �). A local argument assures thatthere is always a choice of the Bloch gauge such thatthe Bloch function is analytic around a given point.However, as several authors noticed [5,13], there mightbe topological obstruction to obtain a global analyticBloch function, in view of the competition between theanalyticity and the pseudoperiodicity.

Hereafter, we denote by ��.k/ the set fEi.k/ W n �i � nCm� 1g, corresponding to a physically relevantfamily ofm Bloch bands, and we assume the followinggap condition:


infk2T�

d

dist .��.k/; �.H.k// n ��.k// > 0: (6)

If a Bloch band En satisfies (6) for m D 1 we say thatit is an single isolated Bloch band. Form > 1, we referto a composite family of Bloch bands.

Single Isolated Bloch BandIn the case of a single isolated Bloch band, the problemof proving the existence of exponentially localizedWannier functions was raised in 1959 by W. Kohn [10],who solved it in dimension d D 1. In higher dimen-sion, the problem has been solved, always in the caseof a single isolated Bloch band, by J. des Cloizeaux [5](under the nongeneric hypothesis that V� has a centerof inversion) and finally by G. Nenciu under generalhypothesis [12], see also [8] for an alternative proof.Notice, however, that in real solids, it might happenthat the interesting Bloch band (e.g., the conductionband in graphene) is not isolated from the rest of thespectrum and that k 7! P�.k/ is not smooth at thedegeneracy point. In such a case, the correspondingWannier function decreases only polynomially.

Composite Family of Bloch BandsIt is well-known that, in dimension d > 1, theBloch bands of crystalline solids are not, in general,isolated. Thus, the interesting problem, in view of realapplications, concerns the case of composite familiesof bands, i.e., m > 1 in (6), and in this context, themore general notion of composite Wannier functionsis relevant [1, 5]. Physically, condition (6) is alwayssatisfied in semiconductors and insulators by consid-ering the family of all the Bloch bands up to the Fermienergy.

Given a composite family of Bloch bands, we con-sider the orthogonal projector (in Dirac’s notation)

P�.k/ WDnCm�1XiDn

jui .k/i hui .k/j ;

which is independent from the Bloch gauge, and wepose P� D R ˚

Y � P�.k/ dk. A function � is called aquasi-Bloch function if

P�.k/�.k; �/ D �.k; �/ and �.k; �/ ¤ 0 8k 2 Y �:(7)

Although the terminology is not standard, we callBloch frame a set f�agaD1;:::;m of quasi-Bloch functionssuch that f�1.k/; : : : ; �m.k/g is an orthonormal basisof RanP�.k/ at (almost-)every k 2 Y �. As in theprevious case, there is a gauge ambiguity: a Blochframe is fixed only up to a k-dependent unitary matrixU.k/ 2 U.m/, i.e., if f�agaD1;:::;m is a Bloch frame thenthe functionse�a.k/ D Pm

bD1 �b.k/Ub;a.k/ also definea Bloch frame.

Definition 2 The composite Wannier functions corre-sponding to a Bloch frame f�agaD1;:::;m are the func-tions

wa.x/ WD � QU�1BF �a

�.x/; a 2 f1; : : : ; mg :

As in the case of a single Bloch band, the expo-nential localization of the composite Wannier functionsis equivalent to the analyticity of the correspond-ing Bloch frame (which, in addition, must be � �-pseudoperiodic). As before, there might be topologicalobstruction to the existence of such a Bloch frame. Asfar as the operator (1) is concerned, the existence ofexponentially localized composite Wannier functionshas been proved in Nenciu [12] in dimension d D 1;as for d > 1, the problem remained unsolved formore than two decades, until recently [2, 16]. Noticethat for magnetic periodic Schrodinger operators theexistence of exponentially localized Wannier functionsis generically false.

The Marzari–Vanderbilt Localization FunctionalTo circumvent the long-standing controversy about theexistence of exponentially localized composite Wan-nier functions, and in view of the application to numeri-cal simulations, the solid-state physics community pre-ferred to introduce the alternative notion of maximallylocalized Wannier functions [11]. The latter are definedas the minimizers of a suitable localization functional,known as the Marzari–Vanderbilt (MV) functional.For a single-band normalized Wannier function w 2L2.Rd /, the localization functional is

FMV .w/ DZRd

jxj2jw.x/j2dx

�dXjD1

�ZRd

xj jw.x/j2dx�2; (8)


S

which is well defined at least wheneverRRd

jxj2jw.x/j2dx < C1. More generally, for a systemof L2-normalized composite Wannier functionsw D fw1; : : : ;wmg � L2.Rd /, the Marzari–Vanderbiltlocalization functional is

FMV .w/ DmXaD1

FMV .wa/ DmXaD1

ZRd

jxj2jwa.x/j2dx

�mXaD1

dXjD1

�ZRd

xj jwa.x/j2dx�2: (9)

We emphasize that the above definition includes thecrucial constraint that the corresponding Bloch func-tions 'a.k; �/ D . QUBF wa/.k; �/, for a 2 f1; : : : ; mg, area Bloch frame.

While such approach provided excellent resultsfrom the numerical viewpoint, the existence andexponential localization of the minimizers have beeninvestigated only recently [17].

Dynamics in Macroscopic ElectromagneticPotentials

To model the transport properties of electrons in solids,one modifies the operator (1) to include the effect ofthe external electromagnetic potentials. Since the lattervary at the laboratory scale, it is natural to assume thatthe ratio " between the lattice constant a D jY j1=d andthe length-scale of variation of the external potentials issmall, i.e., " � 1. The original problem is replaced by

i"@ .; x/

D�1

2.�irx�A."x//2 CV� .x/CV."x/

� .; x/

H" .; x/ (10)

where D "t is the macroscopic time, and V 2C1

b .Rd ;R/ and Aj 2 C1b .Rd ;R/, j 2 f1; : : : ; d g

are respectively the external electrostatic and magneticpotential. Hereafter, for the sake of a simpler notation,we consider only d D 3.

While the dynamical equation (10) is quantummechanical, physicists argued [1] that for suitablewavepackets, which are localized on the nth Blochband and spread over many lattice spacings, the main

effect of the periodic potential V� is the modificationof the relation between the momentum and thekinetic energy of the electron, from the free relationEfree.k/ D 1

2k2 to the function k 7! En.k/ given

by the nth Bloch band. Therefore, the semiclassicalequations of motion are

( Pr D rEn.�/P� D �rV.r/C Pr �B.r/

(11)

where r 2 R3 is the macroscopic position of the

electron, � D k � A.r/ is the kinetic momentumwith k 2 T

�d the Bloch momentum, �rV the external

electric field and B D r � A the external magneticfield.

In fact, one can derive also the first-order correctionto (11). At this higher accuracy, the electron acquiresan effective k-dependent electric moment An.k/ andmagnetic moment Mn.k/. If the nth Bloch band isnon-degenerate (hence isolated), the former is given bythe Berry connection

An.k/ D i hun.k/ ; rkun.k/iHf

D iZY

un.k; y/� rkun.k; y/ dy;

and the latter reads Mn.k/ D i2

˝rkun.k/;�.Hper.k/

�En.k//rkun.k/iHf; i.e., explicitly

�Mn.k/

�i

D i

2

X1�j; l�3

�ijl˝@kj un.k/; .Hper.k/

�En.k//@kl un.k/˛Hf

where �ijl is the totally antisymmetric symbol. Therefined semiclassical equations read

( Pr D r� .En.�/� "B.r/ � Mn.�//� " P� �˝n.�/

P� D �rr .V .r/ � "B.r/ � Mn.�//C Pr �B.r/(12)

where ˝n.k/ D r � An.k/ corresponds to the curva-ture of the Berry connection. The previous equationshave a hidden Hamiltonian structure [14]. Indeed,by introducing the semiclassical Hamiltonian functionHsc.r; �/ D En.�/ C V.r/ � "B.r/ � Mn.�/, (12)become


B.r/ �I

I "An.�/

! PrP�

!D

rrHsc.r; �/

r�Hsc.r; �/

!(13)

where I is the identity matrix and B (resp. An) isthe 3 � 3 matrix corresponding to the vector fieldB (resp. ˝n), i.e., Bl;m.r/ D P

1�j�3 �lmjBj .r/ D.@lAm � @mAl/.r/. Since the matrix appearing on thel.h.s corresponds to a symplectic form�B;" (i.e., a non-degenerate closed 2-form) on R

6, (13) has Hamiltonianform with respect to �B;".

The mathematical derivation of the semiclassicalmodel (12) from (10) as " ! 0 has been accomplishedin Panati et al. [14]. The first-order correction to thesemiclassical (11) was previously investigated in Sun-daram and Niu [19], but the heuristic derivation in thelatter paper does not yield the term of order " in thesecond equation. Without such a term, it is not clear ifthe equations have a Hamiltonian structure.

As for mathematically related problems, both thesemiclassical asymptotic of the spectrum ofH" and thecorresponding scattering problem have been studied indetail (see [7] and references therein). The effectivequantum Hamiltonians corresponding to (10) for " !0 have also been deeply investigated [13].

The connection between (10) and (12) can be ex-pressed either by an Egorov-type theorem involvingquantum observables, or by using Wigner functions.Here we focus on the second approach.

First we define the Wigner function. We considerthe space C D C1

b .R2d / equipped with the standarddistance dC , and the subspace of � �-periodic observ-ables

Cper D fa 2 C W a.r; k C �/ D a.r; k/ 8� 2 � �g:

Recall that, according to the Calderon-Vaillancourttheorem, there is a constant C such that for a 2 C itsWeyl quantizationba 2 B.L2.R3// satisfies

j h ; ba iL2.R3/ j � C dC.a; 0/ k k2 :

Hence, the map C 3 a 7! h ; ba i 2 C is linearcontinuous and thus defines an elementW

" of the dualspace C 0, the Wigner function of . Writing

h ; ba i DW hW " ; aiC0;C

DWZR2d

a.q; p/W " .q; p/ dq dp

and inserting the definition of the Weyl quantization fora one arrives at the formula

W " .q; p/ D 1

.2�/d

ZRd

d ei�p �.q C "=2/

� .q � "=2/; (14)

which yields W " 2 L2.R2d /. Although W

" is real-valued, it attains also negative values in general, soit does not define a probability distribution on phasespace.After this preparation, we can vaguely state the linkbetween (10) and (12), see [20] for the precise for-mulation. Let En be an isolated, nondegenerate Blochband. Denote by ˚

".r; k/ the flow of the dynamicalsystem (12) in canonical coordinates .r; k/ D .r; � CA.r// (recall that the Weyl quantization, and hence thedefinition of Wigner function, is not invariant undernon-linear changes of canonical coordinates). Then foreach finite time-interval I � R there is a constantC such that for 2 I , a 2 Cper and for 0 “well-concentrated on the nth Bloch band” one has

ˇˇZ

R2d

a.q; p/W ./" .q; p/�W 0

" ı ˚ �" .q; p/

�dq dp

ˇˇ

� "2 C dC.a; 0/ k 0k2 ;

where .t/ is the solution to the Schrodinger equation(10) with initial datum 0.

Slowly Varying Deformations andPiezoelectricity

To investigate the contribution of the electrons tothe macroscopic polarization and to the piezoelectriceffect, it is crucial to know how the electrons move ina crystal which is strained at the macroscopic scale.Assuming the usual fixed-lattice approximation, theproblem can be reduced to study the solutions to

i @t .t; x/ D�

�12 C V� .x; "t/

� .t; x/ (15)

for " � 1, where V� .�; t/ is � -periodic for everyt 2 R, i.e., the periodicity lattice does not depend ontime. While a model with a fixed lattice might seemunrealistic at first glance, we refer to Resta [18] and


S

King-Smith and Vanderbilt [9] for its physical justifica-tion. The analysis of the Hamiltonian H.t/ D � 1

2 C

V� .x; t/ yields a family of time-dependent Bloch func-tions fun.k; t/gn2N and Bloch bands fEn.k; t/gn2N.

Assuming that the relevant Bloch band is isolatedfrom the rest of the spectrum, so that (6) holds trueat every time, and that the initial datum is well-concentrated on the nth Bloch band, one obtains asemiclassical description of the dynamics analogous to(12). In this case, the semiclassical equations read

( Pr D rkEn.k; t/ � "�n.k; t/P� D 0

(16)

where

�n.k; t/ D �@tAn.k; t/ � rk�n.k; t/

with

An.k; t/ D i hun.k; t/ ; rkun.k; t/iHf

�n.k; t/ D �i hun.k; t/ ; @tun.k; t/iHf:

The notation emphasizes the analogy with the electro-magnetism: if An.k; t/ and �n.k; t/ are interpreted asthe geometric analogous of the vector potential andof the electrostatic scalar potential, then �n.k; t/ and˝n.k; t/ correspond, respectively, to the electric and tothe magnetic field.

One can rigorously connect (15) and the semiclas-sical model (16), in the spirit of the result stated atthe end of the previous section, see [15]. From (16)one obtains the King-Smith and Vanderbilt formula[9], which approximately predicts the contribution Pof the electrons to the macroscopic polarization of acrystalline insulator strained in the time interval Œ0; T �,namely,

�P D 1

.2�/d

Xn2Nocc

ZY �.An.k; T /� An.k; 0// dk ;

(17)where the sum runs over all the occupied Bloch bands,i.e., Nocc D fn 2 N W En.k; t/ � EF g with EF theFermi energy. Notice that (17) requires the computa-tion of the Bloch functions only at the initial and atthe final time; in view of that, the previous formula

is the starting point of any state-of-the-art numericalsimulation of macroscopic polarization in insulators.

Cross-References

�Born–Oppenheimer Approximation, AdiabaticLimit, and Related Math. Issues

�Mathematical Theory for Quantum Crystals

References

1. Blount, E.I.: Formalism of band theory. In: Seitz, F., Turn-bull, D. (eds.) Solid State Physics, vol. 13, pp. 305–373.Academic, New York (1962)

2. Brouder, Ch., Panati, G., Calandra, M., Mourougane, Ch.,Marzari, N.: Exponential localization of Wannier functionsin insulators. Phys. Rev. Lett. 98, 046402 (2007)

3. Catto, I., Le Bris, C., Lions, P.-L.: On the thermodynamiclimit for Hartree-Fock type problems. Ann. Henri Poincare18, 687–760 (2001)

4. Cances, E., Deleurence, A., Lewin, M.: A new approachto the modeling of local defects in crystals: the reducedHartree-Fock case. Commun. Math. Phys. 281, 129–177(2008)

5. des Cloizeaux, J.: Analytical properties of n-dimensionalenergy bands and Wannier functions. Phys. Rev. 135,A698–A707 (1964)

6. Goedecker, S.: Linear scaling electronic structure methods.Rev. Mod. Phys. 71, 1085–1111 (1999)

7. Gerard, C., Martinez, A., Sjostrand, J.: A mathematicalapproach to the effective Hamiltonian in perturbed periodicproblems. Commun. Math. Phys. 142, 217–244 (1991)

8. Helffer, B., Sjostrand, J.: Equation de Schrodinger avecchamp magnetique et equation de Harper. In: Holden, H.,Jensen, A. (eds.) Schrodinger Operators. Lecture Notes inPhysics, vol. 345, pp. 118–197. Springer, Berlin (1989)

9. King-Smith, R.D., Vanderbilt, D.: Theory of polarization ofcrystalline solids. Phys. Rev. B 47, 1651–1654 (1993)

10. Kohn, W.: Analytic properties of Bloch waves and Wannierfunctions. Phys. Rev. 115, 809–821 (1959)

11. Marzari, N., Vanderbilt, D.: Maximally localized general-ized Wannier functions for composite energy bands. Phys.Rev. B 56, 12847–12865 (1997)

12. Nenciu, G.: Existence of the exponentially localised Wan-nier functions. Commun. Math. Phys. 91, 81–85 (1983)

13. Nenciu, G.: Dynamics of band electrons in electric andmagnetic fields: rigorous justification of the effective Hamil-tonians. Rev. Mod. Phys. 63, 91–127 (1991)

14. Panati, G., Spohn, H., Teufel, S.: Effective dynamics forBloch electrons: Peierls substitution and beyond. Commun.Math. Phys. 242, 547–578 (2003)

15. Panati, G., Sparber, Ch., Teufel, S.: Geometric currents inpiezoelectricity. Arch. Ration. Mech. Anal. 191, 387–422(2009)

16. Panati, G.: Triviality of Bloch and Bloch-Dirac bundles.Ann. Henri Poincare 8, 995–1011 (2007)

http://dx.doi.org/10.1007/978-3-540-70529-1_260

http://dx.doi.org/10.1007/978-3-540-70529-1_262

1340 Source Location

17. Panati, G., Pisante, A.: Bloch bundles, Marzari-Vanderbiltfunctional and maximally localized Wannier functions.preprint arXiv.org (2011)

18. Resta, R.: Theory of the electric polarization in crystals.Ferroelectrics 136, 51–75 (1992)

19. Sundaram, G., Niu, Q.: Wave-packet dynamics in slowlyperturbed crystals: gradient corrections and Berry-phaseeffects. Phys. Rev. B 59, 14915–14925 (1999)

20. Teufel, S., Panati, G.: Propagation of Wigner functions forthe Schrodinger equation with a perturbed periodic poten-tial. In: Blanchard, Ph., Dell’Antonio, G. (eds.) MultiscaleMethods in Quantum Mechanics. Birkhauser, Boston (2004)

21. Thonhauser, T., Ceresoli, D., Vanderbilt, D., Resta, R.:Orbital magnetization in periodic insulators. Phys. Rev. Lett.95, 137205 (2005)

22. Wannier, G.H.: The structure of electronic excitation levelsin insulating crystals. Phys. Rev. 52, 191–197 (1937)

Source Location

Victor IsakovDepartment of Mathematics and Statistics, WichitaState University, Wichita, KS, USA

General Problem

Let A be a partial differential operator of second order

Au D f in ˝: (1)

In the inverse source problem, one is looking for thesource term f from the boundary data

u D g0; @�u D g1 on �0 � @˝; (2)

where g0; g1 are given functions. In this short expos-itory note, we will try to avoid technicalities, so weassume that (in general nonlinear) A is defined by aknown C2-function and f is a function of x 2 ˝

where ˝ is a given bounded domain in Rn with C2

boundary. � denotes the exterior unit normal to theboundary of a domain. Hk.˝/ is the Sobolev spacewith the norm kk.k/.˝/.

A first crucial question is whether there is enoughdata to (uniquely) find f . If A is a linear operator, thensolution f of this problem is not unique. Indeed, letu0 be a function in the Sobolev space H2.˝/ withzero Cauchy data u0 D @�u0 D 0 on �0, and let

f0 D Au0. Due to linearity, A.u C u0/ D f C f0.Obviously, u and u C u0 have the same Cauchy dataon �0, so f and f C f0 produce the same data (2),but they are different in ˝ . It is clear that there is avery large (infinite dimensional) manifold of solutionsto the inverse source problem (1) and (2). To regainuniqueness, one has to restrict unknown distributions toa smaller but physically meaningful uniqueness class.

Inverse Problems of Potential Theory

We start with an inverse source problem which has along and rich history. Let ˚ be a fundamental solutionof a linear second-order elliptic partial differentialoperator A in R

n. The potential of a (Radon) measure� supported in ˝ is

u.xI�/ DZ˝

˚.x; y/d�.y/: (3)

The general inverse problem of potential theory isto find �; supp� � ˝; from the boundary data (2).

Since Au.I�/ D � (in generalized sense), theinverse problem of potential theory is a particular caseof the inverse source problem. In the inverse problemof gravimetry, one considers A D ��,

˚.x; y/ D 1

4�jx � yj ;

and the gravity field is generated by volume massdistribution f 2 L1.˝/. We will identify f with ameasure �. Since f with the data (2) is not unique,one can look for f with the smallest (L2.˝/-) norm.The subspace of harmonic functions fh is L2-closed,so for any f , there is a unique fh such that f Dfh C f0 where f0 is (L2)-orthogonal to fh. Since thefundamental solution is a harmonic function of y whenx is outside ˝ , the term f0 produces zero potentialoutside˝ . Hence, the harmonic orthogonal componentof f has the same exterior data and minimal L2-norm. Applying the Laplacian to the both sides of theequation ��u.Ifh/ D fh, we arrive at the biharmonicequation �2u.Ifh/ D 0 in ˝ . When �0 D @˝ , wehave a well-posed first boundary value problem forthe biharmonic equation for u.Ifh/. Solving this prob-lem, we find fh from the previous Poisson equation.

Source Location 1341

S

However, it is hard to interpret fh (geo)physically,knowing fh does not help much with finding f .

A (geo)physical intuition suggests looking for aperturbing inclusion D of constant density, i.e., forf D �D (characteristic function of an open set D).

Since (in distributional sense) ��u.I�/ D � in˝ , by using the Green’s formula (or the definition ofa weak solution), we yield

�Z˝

u�d� DZ@˝

..@�u/u� � .@�u�/u/ (4)

for any function u� 2 H1.˝/ which is harmonic in˝ .If �0 D @˝ , then the right side in (4) is known; we aregiven all harmonic moments of �. In particular, lettingu� D 1, we obtain the total mass of�, and by letting u�to be coordinate (linear) functions, we obtain momentsof � of first order and hence the center of gravity of �.

Even when one assumes that f D �D , there is anonuniqueness due to possible disconnectedness of thecomplement of D. Indeed, it is well known that if Dis the ball B.a;R/ with center a of radius R, thenits Newtonian potential u.x;D/ D M 1

4�jx�aj , whereM is the total mass of D. So the exterior potentialsof all annuli B.a;R2/ n B.a;R1/ are the same whenR31�R32 D C whereC is a positive constant. Moreover,by using this simple example and some reflections inRn, one can find two different domains with connected

boundaries and equal exterior Newtonian potentials.Augmenting this construction by the condensation ofsingularities argument from the theory of functionsof complex variables, one can construct a continuumof different domains with connected boundaries andthe same exterior potential. So there is a need to havegeometrical conditions onD.

A domain D is called star shaped with respect to apoint a if any ray originated at a intersects D over aninterval. An open setD is x1 convex if any straight lineparallel to the x1-axis intersects D over an interval.

In what follows �0 is a non-void open subset of @˝ .

Theorem 1 LetD1;D2 be two domains which are starshaped with respect to their centers of gravity or twox1 convex domains in Rn. Let u1; u2 be potentials ofD D D1;D2.

If u1 D u2; @�u1 D @�u2 on �0, then D1 D D2.

Returning to the uniqueness proof, we assume thatthere are two x1-convex D1;D2 with the same data.By uniqueness in the Cauchy problem for the Laplace

equation, u1 D u2 near @˝ . Then from (4) (with d� D.�D1 � �D2/dm, dm is the Lebesgue measure)

ZD1

u� DZD2

u�

for any function u� which is harmonic in˝ . Novikov’smethod of orthogonality is to assume that D1 and D2

are different and then to select u� in such way that theleft integral is less than the right one. To achieve thisgoal, u� is replaced by its derivative, and one integratesby parts to move integrals to boundaries and makes useof the maximum principles to bound interior integrals.

The inverse problem of potential theory is a severely(exponentially) ill-conditioned problem of mathemat-ical physics. The character of stability, conditionalstability estimates, and regularization methods of nu-merical solutions of such problems are studied startingfrom pioneering work of Fritz John and Tikhonov in1950–1960s.

To understand the degree of ill conditioning, one canconsider harmonic continuation from the circle �0 Dfx W jxj D Rg onto the inner circle � D fx W jxj D�g. By using polar coordinates .r; �/, any harmonicfunction decaying at infinity can be (in a stable way)approximated by u.r; �IM/ D PM

mD1 umr�meim� .Let us define the linear operator of the continuationas A.@r.; R// D @ru.; �/. Using the formula foru.IM/, it is easy to see that the condition number ofthe corresponding matrix is .R

�/M which is growing

exponentially with respect to M . If R�

D 10, thenthe use of computers is only possible when M < 16,and typical practical measurements errors of 0:01 allowmeaningful computational results when M < 3.

The following logarithmic stability estimate holdsand can be shown to be best possible. We denote byjj2.S2/ the standard norm in the space C2.S2/.

Theorem 2 LetD1;D2 be two domains given in polarcoordinates .r; �/ by the equations @Dj D fr Ddj .�/g where jdj j2.S2/ � M2;

1M2

< dj ; j D 1; 2.Let " D jju1 � u2jj.1/.�0/C jj@�.u1 � u2/jj.0/.�0/.

Then there is a constant C depending only onM2; �0 such that jd1 � d2j � C.�log"/� 1

C .

A proof in [4] is using some ideas from the proofof Theorem 1 and stability estimates for harmoniccontinuation.

Moreover, while it is not possible to obtain (even lo-cal) existence results, a special local existence theorem

1342 Source Location

is available [4], chapter 5. In more detail, if oneassumes that u0 is a potential of some C3-domain D0,that the Cauchy data for a function u are close tothe Cauchy data of u0, and that, moreover, u admitsharmonic continuation across @D0, as well as suitablebehavior at infinity, then u is a potential of a domainDwhich is close to D0.

The exterior gravity field of a polygon (polyhedron)D develops singularities at the corner points of D.Indeed, @j @ku.xI�D/ where D is a polyhedron withcorner at x0 behaves as �C logjx � x0j, [4], section4.1. Since these singularities are uniquely identified bythe Cauchy data, one has obvious uniqueness resultsunder mild geometrical assumptions on D. Moreover,the use of singularities provides us with constructiveidentification tools, based on range type algorithmsin the harmonic continuation, using, for example, theoperator of the single layer potential.

For proofs and further results on inverse problemsof potential theory, we refer to the work of V. Ivanov,Isakov, and Prilepko [4, 7].

An inverse source problem for nonlinear ellipticequations arises when detecting doping profile (sourceterm in equations modeling semiconductors).

In the inverse problem of magnetoencephalogra-phy, A is defined to be Maxwell’s system, and f isa first-order distribution supported in ˝ (e.g., headof a patient). As above, there are difficulties due tononuniqueness and severe instability. One of the simplecases is when f D PM

mD1 [email protected]/ı.�x.m//, whereı.�x.m// is the Dirac delta function with the polex.m/ and d.m/ is a direction. Then uniqueness off is obvious, and for not large M , the problem ofdetermining am; x.m/ is well conditioned. However,such simplification is not satisfactory for medical di-agnostics. For simplicity of exposition, we let nowA D ��. One of the more realistic assumptions isthat f is a double layer distributed with density g overa so-called cortical surface � , i.e., f D g@�d� . �can be found by using different methods, so one canassume that it is known. So one looks for a functiong 2 L1.� / on � from the Cauchy data (2) for thedouble layer potential

u.xIf / DZ�

g.y/@�.y/˚.x; y/d� .y/:

Uniqueness of g (up to a constant) is obvious, andstability is similar to the inverse problem of gravimetry.

For biomedical inverse (source) problems, we referto [1].

Finding Sources of Stationary Waves

Stationary waves of frequency k in a simple case aresolutions to the Helmholtz equation, i.e.,A D ��k2.The radiating fundamental solution of this equation is

˚.x; y/ D eikjx�yj

4�jx � yj :

The inverse source problem at a fixed k has manysimilarities with the case k D 0, except that maximumprinciples are not valid anymore. In particular, Theo-rem 1 is not true: potential of a ball u.I�B/ can be zerooutside˝ containingB for certain choices of k and theradius of B .

Looking for f supported in ND (D is a subdomainof ˝) can be viewed as finding acoustical sourcesdistributed over ND. Besides, this inverse source prob-lem has immediate applications to so-called acousticalholography. This is a method to detect (mechanical)vibrations of � D @D from measurements of acousti-cal pressure u on �0 � @˝ . In simple accepted models,the normal speed of � is @�u on � . By solving theexterior Dirichlet problem for the Helmholtz equationoutside ˝ , one can uniquely and in a stable waydetermine @�u on �0. One can show that if k is nota Dirichlet eigenvalue, then any H1.˝/ solution u tothe Helmholtz equation can be uniquely represented byu.Igd� /, so we can reduce the continuation problemto the inverse source problem for � D gd� (singlelayer distribution over � ).

The continuation of solutions of the Helmholtzequation is a severely ill-posed problem, but its illconditioning is decreasing when k grows, and if one islooking for the “low frequency” part of g, then stabilityis Lipschitz. This “low frequency” part is increasingwith growing k.

As above, the inverse source problem at fixed k

has the similar uniqueness features. However, if f Df0 C kf1, where f0; f1, depend only on x, one regainsuniqueness. This statement is easier to understand con-sidering u as the time Fourier transform of a solution ofa wave equation with f0; f1 as the initial data. In tran-sient (nonstationary) problems, one collects additionalboundary data over a period of time.

Source Location 1343

S

Hyperbolic Equations

The inverse source problem in a wave motion is to find.u; f / 2 H.2/.˝/ � L2.˝/ from the following partialdifferential equation

@2t u ��u D f; @2t f D 0 in ˝ D G � .0; T /;

with natural lateral boundary and initial conditions

@�u D 0 on @˝ � .0; T /; u D @tu D 0 on ˝ � f0g;(5)

and the additional data

u D g on �0 D S0 � .0; T /;where S0 is a part of @G. Assuming @2t u 2 H2.˝/ andletting

v D @2t u (6)

and differentiating twice with respect to t transformthis problem into finding the initial data in the follow-ing hyperbolic mixed boundary value problem

@2t v ��v D 0 in ˝; (7)

with the lateral boundary condition

@�v D 0 on @G � .0; T /; (8)

from the additional data

v D @2t g on �0: (9)

Indeed, one can find u from (6) and the initial condi-tions (5).

Theorem 3 Let

2dist.x; S0IG/ < T; x 2 @G:

Then the data (9) on �0 for a solution v of (7) and(8) uniquely determine v on ˝ .

If, in addition, S0 D @G, then

kv.; 0/k.1/.G/C k@tv.; 0/k.0/.G/ � Ck@2t gk.1/.�0/:(10)

Here, d.x; S0IG/ is the (minimal) distance from x

to S0 inside G.

The statement about uniqueness for arbitrary S0follows from the sharp uniqueness of the continua-tion results for second-order hyperbolic equations andsome geometric ideas [5], section 3.4. For hyperbolicequations with analytic coefficients, these sharp resultsare due to Fritz John and are based on the HolmgrenTheorem. For C1-space coefficients, the HolmgrenTheorem was extended by Tataru. Stability of con-tinuation (and hence in the inverse source problem)is (as for the harmonic continuation) at best of thelogarithmic type (i.e., we have severely ill-conditionedinverse problem).

When S0 D @G, one has a very strong (bestpossible) Lipschitz stability estimate (10). This esti-mate was obtained by Lop-Fat Ho (1986) by using thetechnique of multipliers; for more general hyperbolicequations by Klibanov, Lasiecka, Tataru, and Triggiani(1990s) by using Carleman-type estimates; and byBardos, Lebeau, and Rauch (1992) by propagation ofsingularities arguments. Similar results are availablefor general linear hyperbolic equations of second or-der with time-independent coefficients. However, forLipschitz stability, one has to assume the existence of asuitable pseudo-convex function or absence of trappedbicharacteristics. Looking for the source of the wavemotion (in the more complicated elasticity system), inparticular, can be interpreted as finding location andintensity of earthquakes. The recent medical diagnostictechnique called thermoacoustical tomography canbe reduced to looking for the initial displacement u0.One of the versions of this problem is a classical oneof looking for a function from its spherical means.In a limiting case when radii of spheres are gettinglarge, one arrives at one of the most useful problems oftomography whose mathematical theory was initiatedby Radon (1917) and Fritz John (1940s). For a recentadvance in tomography in case of general attenuation,we refer to [2]. Detailed references are in [4, 5].

In addition to direct applications, the inverse sourceproblems represent linearizations of (nonlinear) prob-lems of finding coefficients of partial differential equa-tions and can be used in the study of uniqueness andstability of identification of coefficients. For example,subtracting two equations @2t u2�a2�u2 D 0 and @2t u1�a1�u1 D 0 yields @2t u�a2�u D f with ˛ D �u1 (asa known weight function) and unknown f D a2 � a1.A general technique to show uniqueness and stabilityof such inverse source problems by utilizing Carlemanestimates was introduced in [3].

1344 Sparse Approximation

Acknowledgements This research was in part supported bythe NSF grant DMS 10-07734 and by Emylou Keith and BettyDutcher Distinguished Professorship at WSU.

References

1. Ammari, H., Kang, H.: An Introduction to Mathematics ofEmerging Biomedical Imaging. Springer, Berlin (2008)

2. Arbuzov, E.V., Bukhgeim, A.L., Kazantsev, S.G.: Two-dimensional tomography problems and the theory of A-analytic functions. Siber. Adv. Math. 8, 1–20 (1998)

3. Bukhgeim, A.L., Klibanov, M.V.: Global uniqueness of classof multidimensional inverse problems. Sov. Math. Dokl. 24,244–247 (1981)

4. Isakov, V.: Inverse Source Problems. American MathematicalSociety, Providence (1990)

5. Isakov, V.: Inverse Problems for PDE. Springer, New York(2006)

6. Novikov, P.: Sur le probleme inverse du potentiel. Dokl.Akad. Nauk SSSR 18, 165–168 (1938)

7. Prilepko, A.I., Orlovskii, D.G., Vasin, I.A.: Methods forSolving Inverse Problems in Mathematical Physics. MarcelDekker, New York (2000)

Sparse Approximation

Holger RauhutLehrstuhl C fur Mathematik (Analysis),RWTH Aachen University, Aachen, Germany

Definition

The aim of sparse approximation is to represent anobject – usually a vector, matrix, function, image,or operator – by a linear combination of only fewelements from a basis, or more generally, from aredundant system such as a frame. The tasks at hand areto design efficient computational methods for findingsparse representations and to estimate the approxima-tion error that can be achieved for certain classes ofobjects.

Overview

Sparse approximations are motivated by several typesof applications. An important source is the varioustasks in signal and image processing tasks, where itis an empirical finding that many types of signalsand images can indeed be well approximated by a

sparse representation in an appropriate basis/frame.Concrete applications include compression, denois-ing, signal separation, and signal reconstruction (com-pressed sensing).

On the one hand, the theory of sparse approximationis concerned with identifying the type of vectors, func-tions, etc. which can be well approximated by a sparseexpansion in a given basis or frame and with quantify-ing the approximation error. For instance, when givena wavelet basis, these questions relate to the area offunction spaces, in particular, Besov spaces. On theother hand, algorithms are required to actually find asparse approximation to a given vector or function.In particular, if the frame at hand is redundant or ifonly incomplete information is available – as it is thecase in compressed sensing – this is a nontrivial task.Several approaches are available, including convexrelaxation (`1-minimization), greedy algorithms, andcertain iterative procedures.

Sparsity

Let x be a vector in RN or C

N or `2.� / for somepossibly infinite set � . We say that x is s-sparse if

kxk0 WD #f` W x` ¤ 0g � s:

For a general vector x, the error of best s-term approx-imation quantifies the distance to sparse vectors,

�s.x/p WD infzWkzk0�s

kx � zkp:

Here, kxkp D .P

j jxj jp/1=p is the usual `p-norm for0 < p < 1 and kxk1 D supj jxj j. Note that thevector z minimizing �s.c/p equals x on the indicescorresponding to the s largest absolute coefficientsof x and is zero on the remaining indices. We saythat x is compressible if �s.x/p decays quickly in s,that is, for suitable s we can approximate x well byan s-sparse vector. This occurs for instance in theparticular case when x is taken from the `q-unit ballBq D fx W kxkq � 1g for small q. Indeed, an inequalitydue to Stechkin (see, e.g., [21, Lemma 3.1]) states that,for 0 < q < p,

�s.x/p � s1=p�1=qkxkq: (1)

Sparse Approximation 1345

S

This inequality enlightens the importance of `q-spaceswith q < 1 in this context.

The situation above describes sparsity with respectto the canonical basis. For a more general setup, con-sider a (finite- or infinite-dimensional) Hilbert space H(often a space of functions) endowed with an orthonor-mal basis f j ; j 2 J g. Given an element f 2 H, ouraim is to approximate it by a finite linear combinationof the j , that is, by

Xj2S

xj j ;

where S � J is of cardinality at most s, say. In contrastto linear approximation, the index set S is not fixed apriori but is allowed to depend on f . Analogously asabove, the error of best s-term approximation is thendefined as

�s.f /H WD infxWkxk0�s

kf �Xj2J

xj j kH

and the elementP

j xj j with kxk0 � s realizingthe infimum is called a best s-term approximation tof . Due to the fact that the support set of x (i.e.,the index set of nonzero entries of x) is not fixed apriori, the set of such elements does not form a linearspace, so that one sometimes simply refers to nonlinearapproximation [12, 30].

One may generalize this setup further. For instance,instead of requiring that f j W j 2 J g forms anorthonormal basis, one may assume that it is a frame[7,19,23], that is, there are constants 0 < A � B < 1such that

Akf k2H �Xj2J

jh j ; f ij2 � Bkf k2H:

This definition includes orthonormal bases but allowsalso redundancy, that is, the coefficient vector x in theexpansion f D P

j2J xj j is no longer unique. Re-dundancy has several advantages. For instance, sincethere are more possibilities for a sparse approximationof f , the error of s-sparse approximation may poten-tially be smaller. On the other hand, it may get harder toactually find a sparse approximation (see also below).

In another direction, one may relax the assump-tion that H is a Hilbert space and only require it tobe a Banach space. Clearly, then the notion of an

orthonormal basis also does not make sense anymore,so that f j ; j 2 J g is then just some system ofelements spanning the space – possibly a basis.

Important types of systems f j ; j 2 J g consid-ered in this context include the trigonometric systemfe2�ik�; k 2 Zg � L2Œ0; 1�, wavelet systems [9, 36], orGabor frames [23].

Quality of a Sparse Approximation

One important task in the field of sparse approximationis to quantify how well an element f 2 H or awhole class B � H of elements can be approximatedby sparse expansions. An abstract way [12, 20] ofdescribing good approximation classes is to introduce

Bp WD ff 2 H W f DXj2J

xj j ; kxkp < 1g

with norm kf kBp D inffkxkp W f D Pj xj j g. If

f j W j 2 J g is an orthonormal basis, then it followsdirectly from (1) that, for 0 < p < 2,

�s.f /H � s1=2�1=pkf kBp : (2)

In concrete situations, the task is then to characterizethe spaces Bp . In the case, that f j W j 2 J gis a wavelet system, then one obtains Besov spaces[32, 36], and in the case of the trigonometric system,this results in the classical Fourier algebra whenp D 1.If f j W j 2 J g is a frame, then (2) remains validup to a multiplicative constant. In the special case ofGabor frames, the space Bp coincides with a class ofmodulation spaces [23].

Algorithms for Sparse Approximation

For practical purposes, it is important to have algo-rithms for computing optimal or at least near-optimalsparse approximations. When H D C

N is finite di-mensional and f j ; j D 1; : : : ; N g � C

N is anorthonormal basis, then this is easy. In fact, the coef-ficients in the expansion f D PN

jD1 xj j are givenby xj D hf; j i, so that a best s-term approximationto f in H is given by


Xj2S

hf; j i j

where S is an index set of s largest absolute entries ofthe vector .hf; j i/NjD1.

When f j ; j D 1; : : : ;M g � CN is redundant, that

is, M > N , then it becomes a nontrivial problem tofind the sparsest approximation to a given f 2 C

N .Denoting by � the N � M matrix whose columnsare the vectors j , this problem can be expressed asfinding the minimizer of

min kxk0 subject to k�x � f k2 � "; (3)

for a given threshold " > 0. In fact, this problemis known to be NP hard in general [11, 25]. Sev-eral tractable alternatives have been proposed. Wediscuss the greedy methods matching pursuit and or-thogonal matching pursuit as well as the convex re-laxation method basis pursuit (`1-minimization) next.Other sparse approximation algorithms include itera-tive schemes, such as iterative hard thresholding [1]and iteratively reweighted least squares [10].

Matching PursuitsGiven a possibly redundant system f j ; j 2 J g �H – often called a dictionary – the greedy algorithmmatching pursuit [24,27,31,33] iteratively builds up thesupport set and the sparse approximation. Starting withr0 D f , S0 D ; and k D 0 it performs the followingsteps:

1. jk WD argmaxn jhrk; j ij

k j k W j 2 Jo.

2. SkC1 WD Sk [ fjkg.

3. rkC1 D rk � hrk; jk ik jk k2 jk .

4. k 7! k C 1.5. Repeat from step (1) with k 7! k C 1 until a

stopping criterion is met.

6. Output ef D ef k D Pk`D1

hr`; j` ik j`k2 j` .

Clearly, if s steps of matching pursuit are performed,then the output ef has an s-sparse representation withrespect to ef . It is known that the sequence ef k con-verges to f when k tends to infinity [24]. A possiblestopping criterion for step (5) is a maximal number ofiterations, or that the residual norm krkk � � for someprescribed tolerance � > 0.

Matching pursuit has the slight disadvantage that anindex k may be selected more than once. A variation

on this greedy algorithm which avoids this drawbackconsists in the orthogonal matching pursuit algorithm[31, 33] outlined next. Again, starting with r0 D f ,S0 D ; and k D 0, the following steps are con-ducted:1. jk WD argmax

n jhrk; j ijk j k W j 2 J

o.

2. SkC1 WD Sk [ fjkg.3. x.kC1/ WD argminzWsupp.z/�SkC1

kf � Pj2SkC1

zj j k2.4. rkC1 WD f �P

j2SkC1x.kC1/j j .

5. Repeat from step (1) with k 7! k C 1 until astopping criterion is met.

6. Output ef D ef k D Pj2Sk x

.k/j j .

The essential difference to matching pursuit is theorthogonal projection step in (3). Orthogonal matchingpursuit may require a smaller number of iterations thanmatching pursuit. However, the orthogonal projectionmakes an iteration computationally more demandingthan an iteration of matching pursuit.

Convex RelaxationA second tractable approach to sparse approximationis to relax the `0-minimization problem to the convexoptimization problem of finding the minimizer of

min kxk1 subject to k�x � f k2 � ": (4)

This program is also known as basis pursuit [6] andcan be solved using various methods from convexoptimization [2]. At least in the real-valued case, theminimizer x� of the above problem will always have atmostN nonzero entries, and the support of x� defines alinear independent set f j W x�

j ¤ 0g, which is a basisof CN if x� has exactly N nonzero entries – thus, thename basis pursuit.

Finding the Sparsest RepresentationWhen the dictionary f j g is redundant, it is of greatinterest to provide conditions which ensure that a spe-cific algorithm is able to identify the sparsest possiblerepresentation. For this purpose, it is helpful to definethe coherence � of the system f j g, or equivalently ofthe matrix � having the vectors j as its columns. As-suming the normalization k j k2 D 1, it is defined asthe maximal inner product between different dictionaryelements,

� D maxj¤k

jh j ; kij:

Sparse Approximation 1347

S

Suppose that f has a representation with s terms, thatis, f D P

j xj j with kxk0 � s. Then s iterationsof orthogonal matching pursuit [3, 33] as well as basispursuit (4) with " D 0 [3, 14, 34] find the sparsestrepresentation of f with respect to f j g provided that

.2s � 1/� < 1: (5)

Moreover, for a general f , both orthogonal matchingpursuit and basis pursuit generate an s-sparse approx-imation whose approximation error is bounded by theerror of best s-term approximation up to constants; see[3, 18, 31] for details.

For typical “good” dictionaries f j gMjD1 � CN , the

coherence scales as � � pN [29], so that the bound

(5) implies that s-sparse representations in such dictio-naries with small enough sparsity, that is, s � c

pN ,

can be found efficiently via the described algorithms.

Applications of Sparse Approximation

Sparse approximation find a variety of applications.Below we shortly describe compression, denoising,and signal separation. Sparse representations play alsoa major role in adaptive numerical methods for solvingoperator equations such as PDEs. When the solutionhas a sparse representation with a suitable basis, sayfinite elements or wavelets, then a significant accel-eration with respect to standard linear methods canbe achieved. The algorithms used in this context areof different nature than the ones described above. Werefer to [8] for details.

CompressionAn obvious application of sparse approximation isimage and signal compression. Once a sparse approx-imation is found, one only needs to store the nonzerocoefficients of the representation. If the representationis sparse enough, then this requires significantly lessmemory than storing the original signal or image.This principle is exploited, for instance, in the JPEG,MPEG, and MP3 data compression standards.

DenoisingOften acquired signals and images are corrupted bynoise, that is, the observed signal can be written as ef Df C �, where f is the original signal and � is a vector

representing the noise. The additional knowledge thatthe signal at hand can be approximated well by asparse representation can be exploited to clean thesignal by essentially removing the noise. The essentialidea is to find a sparse approximation

Pj xj j of

ef with respect to a suitable dictionary f j g and touse it as an approximation to the original f . Onealgorithmic approach is to solve the `1-minimizationproblem (4), where � is now a suitable estimate ofthe `2-norm of the noise �. If � is a wavelet basis,this principle is often called wavelet thresholding orwavelet shrinkage [16] due to connection of the soft-thresholding function [13].

Signal SeparationSuppose one observes the superposition f D f1 C f2of two signals f1 and f2 of different nature, for in-stance, the “harmonic” and the “spiky” componentof an acoustic signal, or stars and filaments in anastronomical image. The task is to separate the twocomponents f1 and f2 from the knowledge of f .Knowing that both f1 and f2 have sparse represen-tations in dictionaries f jj g and f 2j g of “differentnature,” one can indeed recover both f1 and f2 bysimilar algorithms as outlined above, for instance, bysolving the `1-minimization problem

minz1;z2

kz1kCkz2k1 subject to f DXj

z1j 1jCXj

z2j 2j :

The solution .x1; x2/ defines the reconstructionsef 1 D Pj x

1j

1j and ef 2 D P

j x2j

2j . If, for instance,

f 1j gNjD1 and f 2j gNjD1 are mutually incoherent basesin C

N , that is, k 1j k2 D k 2j k2 D 1 for all j and themaximal inner product � D jh 1j ; 2j ij is small, thenthe above optimization problem recovers both f1 andf2 provided they have representations with altogethers terms where s < 1=.2�/ [15]. An example of twomutually incoherent bases are the Fourier basis andthe canonical basis, where � D 1=

pN [17]. Under

a probabilistic model, better estimates are possible[5, 35].

Compressed Sensing

The theory of compressed sensing [4,21,22,26] buildson sparse representations. Assuming that a vector


x 2 CN is s-sparse (or approximated well by a sparse

vector), one would like to reconstruct it from onlylimited information, that is, from

y D Ax; with A 2 Cm�N

wherem is much smaller thanN . Again the algorithmsoutlined above apply, for instance, basis pursuit (4)with A replacing � . In this context, one would liketo design matrices A with the minimal number mof rows (i.e., the minimal number of linear measure-ments), which are required to reconstruct x from y.The recovery criterion based on coherence � of Adescribed above applies but is highly suboptimal. Infact, it can be shown that for certain random matricesm � cs log.eN=s/ measurements suffice to (stably)reconstruct an s-sparse vector using `1-minimizationwith high probability, where c is a (small) universalconstant. This bound is sufficiently better than the onesthat can be deduced from coherence based bounds asdescribed above. A particular case of interest arisewhen A consists of randomly selected rows of thediscrete Fourier transform matrix. This setup corre-sponds to randomly sampling entries of the Fouriertransform of a sparse vector. When m � cs log4 N ,then `1-minimization succeeds to (stably) recover s-sparse vectors fromm samples [26, 28].

This setup generalizes to the situation that one takeslimited measurements of a vector f 2 C

N , whichis sparse with respect to a basis or frame f j gMjD1.In fact, then f D �x for a sparse x 2 C

M and witha measurement matrix A 2 C

m�N , we have

y D Af D A�x;

so that we reduce to the initial situation with A0 D A�

replacing A. Once x is recovered, one forms f D �x.Applications of compressed sensing can be found in

various signal processing tasks, for instance, in medicalimaging, analog-to-digital conversion, and radar.

References

1. Blumensath, T., Davies, M.: Iterative thresholding for sparseapproximations. J. Fourier Anal. Appl. 14, 629–654 (2008)

2. Boyd, S., Vandenberghe, L.: Convex Optimization.Cambridge University Press, Cambridge (2004)

3. Bruckstein, A., Donoho, D.L., Elad, M.: From sparse solu-tions of systems of equations to sparse modeling of signalsand images. SIAM Rev. 51(1), 34–81 (2009)

4. Candes, E.J.: Compressive sampling. In: Proceedings of theInternational Congress of Mathematicians, Madrid (2006)

5. Candes, E.J., Romberg, J.K.: Quantitative robust uncertaintyprinciples and optimally sparse decompositions. Found.Comput. Math. 6(2), 227–254 (2006)

6. Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic de-composition by basis pursuit. SIAM J. Sci. Comput. 20(1),33–61 (1998)

7. Christensen, O.: An Introduction to Frames and RieszBases. Applied and Numerical Harmonic Analysis.Birkhauser, Boston (2003)

8. Cohen, A.: Numerical Analysis of Wavelet Methods. Stud-ies in Mathematics and its Applications, vol. 32, xviii, 336p.EUR 95.00. North-Holland, Amsterdam (2003)

9. Daubechies, I.: Ten Lectures on Wavelets. CBMS-NSF Re-gional Conference Series in Applied Mathematics, vol 61.SIAM, Philadelphia (1992)

10. Daubechies, I., DeVore, R.A., Fornasier, M., Gunturk, C.:Iteratively re-weighted least squares minimization for sparserecovery. Commun. Pure Appl. Math. 63(1), 1–38 (2010)

11. Davis, G., Mallat, S., Avellaneda, M.: Adaptive greedyapproximations. Constr. Approx. 13(1), 57–98 (1997)

12. DeVore, R.A.: Nonlinear approximation. Acta Numer. 7,51–150 (1998)

13. Donoho, D.: De-noising by soft-thresholding. IEEE Trans.Inf. Theory 41(3), 613–627 (1995)

14. Donoho, D.L., Elad, M.: Optimally sparse representation ingeneral (nonorthogonal) dictionaries via `1 minimization.Proc. Natl. Acad. Sci. U.S.A. 100(5), 2197–2202 (2003)

15. Donoho, D.L., Huo, X.: Uncertainty principles and idealatomic decompositions. IEEE Trans. Inf. Theory 47(7),2845–2862 (2001)

16. Donoho, D.L., Johnstone, I.M.: Minimax estimation viawavelet shrinkage. Ann. Stat. 26(3), 879–921 (1998)

17. Donoho, D.L., Stark, P.B.: Uncertainty principles and signalrecovery. SIAM J. Appl. Math. 48(3), 906–931 (1989)

18. Donoho, D.L., Elad, M., Temlyakov, V.N.: Stable recoveryof sparse overcomplete representations in the presence ofnoise. IEEE Trans. Inf. Theory 52(1), 6–18 (2006)

19. Duffin, R.J., Schaeffer, A.C.: A class of nonharmonicFourier series. Trans. Am. Math. Soc. 72, 341–366 (1952)

20. Fornasier, M., Grochenig, K.: Intrinsic localization offrames. Constr. Approx. 22(3), 395–415 (2005)

21. Fornasier, M., Rauhut, H.: Compressive sensing. In:Scherzer, O. (ed.) Handbook of Mathematical Methods inImaging, pp. 187–228. Springer, New York (2011)

22. Foucart, S., Rauhut, H.: A mathematical introduction tocompressive sensing. Applied and Numerical HarmonicAnalysis. Birkhauser, Basel

23. Grochenig, K.: Foundations of Time-Frequency Analysis.Applied and Numerical Harmonic Analysis. Birkhauser,Boston (2001)

24. Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12),3397–3415 (1993)

25. Natarajan, B.K.: Sparse approximate solutions to linearsystems. SIAM J. Comput. 24, 227–234 (1995)

26. Rauhut, H.: Compressive sensing and structured randommatrices. In: Fornasier, M. (ed.) Theoretical Foundations

Special Functions: Computation 1349

S

and Numerical Methods for Sparse Recovery. Radon Se-ries on Computational and Applied Mathematics, vol. 9,pp. 1–92. De Gruyter, Berlin/New York (2010)

27. Roberts, D.H.: Time series analysis with clean I. Derivationof a spectrum. Astronom. J. 93, 968–989 (1987)

28. Rudelson, M., Vershynin, R.: On sparse reconstruction fromFourier and Gaussian measurements. Commun. Pure Appl.Math. 61, 1025–1045 (2008)

29. Strohmer, T., Heath, R.W.: Grassmannian frames with ap-plications to coding and communication. Appl. Comput.Harmon Anal. 14(3), 257–275 (2003)

30. Temlyakov, V.N.: Nonlinear methods of approximation.Found Comput. Math. 3(1), 33–107 (2003)

31. Temlyakov, V.: Greedy Approximation. Cambridge Mono-graphs on Applied and Computational Mathematics, No. 20.Cambridge University Press, Cambridge (2011)

32. Triebel, H.: Theory of Function Spaces. Monographs inMathematics, vol. 78. Birkhauser, Boston (1983)

33. Tropp, J.A.: Greed is good: Algorithmic results for sparseapproximation. IEEE Trans. Inf. Theory 50(10), 2231–2242(2004)

34. Tropp, J.A.: Just relax: Convex programming methods foridentifying sparse signals in noise. IEEE Trans. Inf. Theory51(3), 1030–1051 (2006)

35. Tropp, J.A.: On the conditioning of random subdictionaries.Appl. Comput. Harmon Anal. 25(1), 1–24 (2008)

36. Wojtaszczyk, P.: A Mathematical Introduction to Wavelets.Cambridge University Press, Cambridge (1997)

Special Functions: Computation

Amparo Gil1, Javier Segura2, and Nico M. Temme31Departamento de Matematica Aplicada y Ciencias dela Computacion, Universidad de Cantabria, E.T.S.Caminos, Canales y Puertos, Santander, Spain2Departamento de Matematicas, Estadıstica yComputacion, Universidad de Cantabria, Santander,Spain3Centrum voor Wiskunde and Informatica (CWI),Amsterdam, The Netherlands


65D20; 41A60; 33C05; 33C10; 33C15

Overview

The special functions of mathematical physics [4]are those functions that play a key role in manyproblems in science and engineering. For example,

Bessel, Legendre, or parabolic cylinder functions arewell known for everyone involved in physics. This isnot surprising because Bessel functions appear in thesolution of partial differential equations in cylindri-cally symmetric domains (such as optical fibers) orin the Fourier transform of radially symmetric func-tions, to mention just a couple of applications. Onthe other hand, Legendre functions appear in the so-lution of electromagnetic problems involving sphericalor spheroidal geometries. Finally, parabolic cylinderfunctions are involved, for example, in the analysis ofthe wave scattering by a parabolic cylinder, in the studyof gravitational fields or quantum mechanical problemssuch as quantum tunneling or particle production.

But there are many more functions under the term“special functions” which, differently from the ex-amples mentioned above, are not of hypergeometrictype, such as some cumulative distribution functions[3, Chap. 10]. These functions also need to be evalu-ated in many problems in statistics, probability theory,communication theory, or econometrics.

Basic Methods

The methods used for the computation of special func-tions are varied, depending on the function underconsideration as well as on the efficiency and theaccuracy demanded. Usual tools for evaluating specialfunctions are the evaluation of convergent and diver-gent series, the computation of continued fractions, theuse of Chebyshev approximations, the computation ofthe function using integral representations (numericalquadrature), and the numerical integration of ODEs.Usually, several of these methods are needed in orderto build an algorithm able to compute a given functionfor a large range of values of parameters and argument.Also, an important bonus in this kind of algorithmswill be the possibility of evaluating scaled functions:if, for example, a function f .z/ increases exponentiallyfor large jzj, the factorization of the exponential termand the computation of a scaled function (without theexponential term) can be used to avoid degradations inthe accuracy of the functions and overflow problemsas z increases. Therefore, the appropriate scaling of aspecial function could be useful for increasing both therange of computation and the accuracy of the computedexpression.

1350 Special Functions: Computation

Next, we briefly describe three important techniquesfor computing special functions which appear ubiqui-tously in algorithms for special function evaluation:convergent and divergent series, recurrence relations,and numerical quadrature.

Convergent and Divergent SeriesConvergent series for special functions usually arise inthe form of hypergeometric series:

pFq

0@a1; � � � ; apI zb1; � � � ; bq

1A D

1XnD0

.a1/n � � � .ap/n

.b1/n � � � .bq/nzn

nŠ; (1)

where p � qC 1 and .a/n is the Pochhammer symbol,also called the shifted factorial, defined by

.a/0 D 1; .a/n D a.a C 1/ � � � .a C n� 1/ .n � 1/;

.a/n D �.a C n/

�.a/: (2)

The series is easy to evaluate because of the recur-sion .a/nC1 D .aCn/.a/n, n � 0, of the Pochhammersymbols. For example, for the modified Bessel function

I�.z/ D1

2z�� 1X

nD0

. 14z2/n

�.� C nC 1/ nŠ

D1

2z��

0F1

�� C 1

I 1

4z2!; (3)

this is a stable representation when z > 0 and � � 0

and it is an efficient representation when z is not largecompared with �.

With divergent expansion we mean asymptotic ex-pansions of the form

F.z/ �1XnD0

cn

zn; z ! 1: (4)

The series usually diverges, but it has the property

F.z/ DN�1XnD0

cn

znCRN.z/; RN .z/ D O

�z�N � ;

z ! 1; (5)

for N D 0; 1; 2; : : :, and the order estimate holds forfixed N . This is the Poincare-type expansion and forspecial functions like the gamma and Bessel functionsthey are crucial for evaluating these functions. Othervariants of the expansion are also important, in partic-ular expansions that hold for a certain range of addi-tional parameters (this leads to the uniform asymptoticexpansions in terms of other special functions like Airyfunctions, which are useful in turning point problems).

Recurrence RelationsIn many important cases, there exist recurrence rela-tions relating different values of the function for differ-ent values of its variables; in particular, one can usu-ally find three-term recurrence relations [3, Chap. 4].In these cases, the efficient computation of specialfunctions uses at some stage the recurrence relationssatisfied by such families of functions. In fact, it is dif-ficult to find a computational task which does not relyon recursive techniques: the great advantage of havingrecursive relations is that they can be implemented withease. However, the application of recurrence relationscan be risky: each step of a recursive process generatesnot only its own rounding errors but also accumulatesthe errors of the previous steps. An important aspectis then the study of the numerical condition of therecurrence relations, depending on the initial values forstarting recursion.

If we write the three-term recurrence satisfied by thefunction yn as

ynC1 C bnyn C anyn�1 D 0; (6)

then, if a solution y.m/n of (6) exists that satisfies

limn!C1

y.m/n

y.D/n

D 0 for all solutions y.D/n that are linearly

independent of y.m/n , we will call y.m/n the minimalsolution. The solution y.D/n is said to be a dominantsolution of the three-term recurrence relation. From acomputational point of view, the crucial point is theidentification of the character of the function to beevaluated (either minimal or dominant) because thestable direction of application of the recurrence relationis different for evaluating the minimal or a dominantsolution of (6): forward for dominant solutions andbackward for minimal solutions.

Special Functions: Computation 1351

S

For analyzing whether a special function is minimalor not, analytical information is needed regarding itsbehavior as n ! C1.

Assume that for large values of n the coefficientsan; bn behave as follows.

an � an˛; bn � bnˇ; ab ¤ 0 (7)

with ˛ and ˇ real; assume that t1; t2 are the zeros ofthe characteristic polynomial ˚.t/ D t2 C bt C a

with jt1j � jt2j. Then it follows from Perron’s theorem[3, p. 93] that we have the following results:1. If ˇ > 1

2˛, then the difference equation (6) has two

linearly independent solutions fn and gn, with theproperty

fn

fn�1� �a

bn˛�ˇ;

gn

gn�1� �bnˇ; n ! 1:

(8)In this case, the solution fn is minimal.

2. If ˇ D 12˛ and jt1j > jt2j, then the difference

equation (6) has two linear independent solutions fnand gn, with the property

fn

fn�1� t1n

ˇ;gn

gn�1� t2n

ˇ; n ! 1; (9)

In this case, the solution fn is minimal.3. If ˇ D 1

2˛ and jt1j D jt2j, or if ˇ < 12˛, then

some information is still available, but the theoremis inconclusive with respect to the existence ofminimal and dominant solutions.Let’s consider three-term recurrence relations sat-

isfy by Bessel functions as examples. Ordinary Besselfunctions satisfy the recurrence relation

ynC1 � 2n

zyn C yn�1 D 0; z ¤ 0; (10)

with solutions Jn.z/ (the Bessel function of the firstkind) and Yn.z/ (the Bessel function of the secondkind). This three-term recurrence relation correspondsto (8), with the values a D 1, ˛ D 0, b D � 2

z , ˇ D 1.Then, there exist two independent solutions fn and gnsatisfying

fnC1fn

� z

2n;

gnC1gn

� 2n

z: (11)

As the known asymptotic behavior of the Bessel func-tions reads

Jn.z/ � 1

nŠ

z

2

�n; Yn.z/ � � .n � 1/Š

�

�2

z

�n;

n ! 1; (12)

it is easy to identify Jn.z/ and Yn.z/ as the minimal(fn) and a dominant (gn) solutions, respectively, of thethree-term recurrence relation (10).

Similar results hold for the modified Bessel func-tions, with recurrence relation

ynC1 C 2n

zyn � yn�1 D 0; z ¤ 0; (13)

with solutions In.z/ (minimal) and .�1/nKn.z/ (domi-nant).

Numerical QuadratureAnother example where the study of numericalstability is of concern is the computation of specialfunctions via integral representations. It is tempting,but usually wrong, to believe that once an integralrepresentation is given, the computational problemis solved. One has to choose a stable quadraturerule and this choice depends on the integralunder consideration. Particularly problematic is theintegration of strongly oscillating integrals (Besseland Airy functions, for instance); in these casesan alternative approach consists in finding non-oscillatory representations by properly deforming theintegration path in the complex plane. Particularlyuseful is the saddle point method for obtainingintegral representations which are suitable forapplying the trapezoidal rule, which is optimal forcomputing certain integrals in R. Let’s explain thesaddle point method taking the Airy function Ai.z/as example. Quadrature methods for evaluatingcomplex Airy functions can be found in, forexample, [1, 2].

We start from the following integral representationin the complex plane:

Ai.z/ D 1

2�i

ZCe13w3�zw dw; (14)

1352 Special Functions: Computation

5.0

5.0

−5.0

−5.0

Special Functions: Computation, Fig. 1 Saddle point con-tours for � D 0; 1

3�; 2

3�; � and r D 5

where z 2 C and C is a contour starting at 1e�i�=3and terminating at 1eCi�=3 (in the valleys ofthe integrand). In the example we take ph z 2Œ0; 2

3��.

Let �.w/ D 13w3 � zw. The saddle points are

w0 D pz and �w0 and follow from solving �0.w/ D

w2 � z D 0. The saddle point contour (the path ofsteepest descent) that runs through the saddle point w0is defined by =Œ�.w/� D =Œ�.w0/�.

We write

z D x C iy D rei� ; w D u C iv; w0 D u0 C iv0:

(15)Then

u0 D pr cos 1

2�; v0 D p

r sin 1

2�; x D u20 � v20;

y D 2u0v0: (16)

The path of steepest descent through w0 is given by theequation

u D u0 C .v � v0/.v C 2v0/

3

�u0 C

q13.v2 C 2v0v C 3u20/

;

�1 < v < 1: (17)

Examples for r D 5 and a few ��values are shownin Fig. 1. The relevant saddle points are located on

the circle with radiuspr and are indicated by small

dots.The saddle point on the positive real axis corre-

sponds with the case � D 0 and the two saddleson the imaginary axis with the case � D � . It isinteresting to see that the contour may split up and runthrough both saddle points ˙w0. When � D 2

3� both

saddle points are on one path, and the half-line in thez�plane corresponding with this � is called a Stokesline.

Integrating with respect to D v � v0 (and writing� D u � u0), we obtain

Ai.z/ D e��

2�i

Z 1

�1e r .�;/

�d�

dC i

�d; (18)

where � D 23z32 and

� D . C 3v0/

3

�u0 C

q13.2 C 4v0 C 3r/

; �1 < < 1;

(19)

r.�; / D <Œ�.w/ � �.w0/� D u0.�2 � 2/

�2v0� C 1

3�3 � �2: (20)

The integral representation for the Airy functionin (18) is now suitable for applying the trapezoidalrule. The resulting algorithm will be flexible andefficient.

References

1. Gautschi, W.: Computation of Bessel and Airy functions andof related Gaussian quadrature formulae. BIT 42(1), 110–118(2002)

2. Gil, A., Segura, J., Temme, N.M.: Algorithm 819: AIZ, BIZ:two Fortran 77 routines for the computation of complexAiry functions. ACM Trans. Math. Softw. 28(3), 325–336(2002)

3. Gil, A., Segura, J., Temme, N.M.: Numerical Methods forSpecial Functions. Society for Industrial and Applied Mathe-matics (SIAM), Philadelphia (2007)

4. Olver, F., Lozier, D., Boisvert, R., Clark, C.: NIST Handbookof Mathematical Functions. Cambridge University Press,Cambridge/New York (2010). http://dlmf.nist.gov

http://dlmf.nist.gov

Splitting Methods 1353

S

SplittingMethods

Sergio Blanes1, Fernando Casas2, and Ander Murua31Instituto de Matematica Multidisciplinar, UniversitatPolitecnica de Valencia, Valencia, Spain2Departament de Matematiques and IMAC,Universitat Jaume I, Castellon, Spain3Konputazio Zientziak eta A.A. Saila, InformatikaFakultatea, UPV/EHU, Donostia/San Sebastian, Spain

Synonyms

Fractional step methods; Operator-splitting methods

Introduction

Splitting methods constitute a general class of nu-merical integration schemes for differential equationswhose vector field can be decomposed in such a waythat each subproblem is simpler to integrate than theoriginal system. For ordinary differential equations(ODEs), this idea can be formulated as follows. Giventhe initial value problem

x0 D f .x/; x0 D x.0/ 2 RD (1)

with f W RD �! R

D and solution 't .x0/, assumethat f can be expressed as f D Pm

iD1 f Œi � for certainfunctions f Œi�, such that the equations

x0 D f Œi�.x/; x0 D x.0/ 2 RD; i D 1; : : : ; m

(2)

can be integrated exactly, with solutions x.h/ D'Œi�

h .x0/ at t D h, the time step. The different parts of fmay correspond to physically different contributions.Then, by combining these solutions as

�h D 'Œm�

h ı � � � ı 'Œ2�h ı 'Œ1�h (3)

and expanding into series in powers of h, one findsthat �h.x0/ D 'h.x0/ C O.h2/, so that �h provides afirst-order approximation to the exact solution. Higher-order approximations can be achieved by introducingmore flows with additional coefficients, 'Œi�aij h, in com-position (3).

Splitting methods involve thus three steps: (i)choosing the set of functions f Œi� such that f DP

i fŒi �, (ii) solving either exactly or approximately

each equation x0 D f Œi�.x/, and (iii) combining thesesolutions to construct an approximation for (1) up tothe desired order.

The splitting idea can also be applied to partialdifferential equations (PDEs) involving time and one ormore space dimensions. Thus, if the spatial differentialoperator contains parts of a different character (such asadvection and diffusion), then different discretizationtechniques may be applied to each part, as well as forthe time integration.

Splitting methods have a long history and havebeen applied (sometimes with different names)in many different fields, ranging from parabolicand reaction-diffusion PDEs to quantum statisticalmechanics, chemical physics, and Hamiltoniandynamical systems [7].

Some of the advantages of splitting methods are thefollowing: they are simple to implement, are explicitif each subproblem is solved with an explicit method,and often preserve qualitative properties the differentialequation might possess.

SplittingMethods for ODEs

Increasing the OrderVery often in applications, the function f in the ODE(1) can be split in just two parts, f .x/ D f Œa�.x/ Cf Œb�.x/. Then both �h D '

Œb�

h ı 'Œa�h and its adjoint,

��h ��1�h D '

Œa�

h ı 'Œb�h , are first-order integrationschemes. These formulae are often called the Lie–Trotter splitting. On the other hand, the symmetricversion

S Œ2�h 'Œa�

h=2 ı 'Œb�h ı 'Œa�h=2 (4)

provides a second-order integrator, known as theStrang–Marchuk splitting, the leapfrog, or theStormer–Verlet method, depending on the contextwhere it is used [2]. Notice that S Œ2�h D ��

h=2 ı �h=2.More generally, one may consider a composition of

the form

h D 'Œa�

asC1hı 'Œb�bsh ı 'Œa�ash ı � � � ı 'Œa�a2h ı 'Œb�b1h ı 'Œa�a1h (5)

and try to increase the order of approximation bysuitably determining the parameters ai , bi . The number

1354 SplittingMethods

s of 'Œb�h (or 'Œa�h ) evaluations in (5) is usually referred toas the number of stages of the integrator. This is calledtime-symmetric if h D �

h , in which case one hasa left-right palindromic composition. Equivalently, in(5), one has

a1 D asC1; b1 D bs; a2 D as; b2 D bs�1; : : :(6)

The order conditions the parameters ai , bi have tosatisfy can be obtained by relating the previous in-tegrator h with a formal series �h of differentialoperators [1]: it is known that the h-flow 'h of theoriginal system x0 D f Œa�.x/ C f Œb�.x/ satisfies,for each g 2 C1.RD;R/, the identity g.'h.x// Deh.F

Œa�CF Œb�/Œg�.x/, where F Œa� and F Œb� are the Liederivatives corresponding to f Œa� and f Œb�, respec-tively, acting as

F Œa�Œg�.x/ DDXjD1

fŒa�j .x/

@g

@xj.x/;

F Œb�Œg�.x/ DDXjD1

fŒb�j .x/

@g

@xj.x/: (7)

Similarly, the approximation h.x/ 'h.x/ givenby the splitting method (5) satisfies the identityg. h.x// D �.h/Œg�.x/, where

�.h/ D ea1hFŒa�

eb1hFŒb� � � � eashF

Œa�

ebshFŒb�

easC1hFŒa�

:

(8)

Hence, the coefficients ai ; bi must be chosen in such away that the operator �.h/ is a good approximation ofeh.F

Œa�CF Œb�/, or equivalently, h�1 log.�/ F Œa�CF Œb�.Applying repeatedly the Baker–Campbell–

Hausdorff (BCH) formula [2], one arrives at

1

hlog.�.h// D .vaF

Œa� C vbFŒb�/C hvabF

Œab�

Ch2.vabbF Œabb� C vabaFŒaba�/

Ch3.vabbbF Œabbb� C vabbaFŒabba�

CvabaaF Œabaa�/C O.h4/; (9)

where

F Œab�D ŒF Œa�; F Œb��; F Œabb� D ŒF Œab�; F Œb��;

F Œaba� D ŒF Œab�; F Œa��; F Œabbb�D ŒF Œabb�; F Œb��;

F Œabba� D ŒF Œabb�; F Œa��; F Œabaa� D ŒF Œaba�; F Œa��;

the symbol Œ�; �� stands for the Lie bracket, andva; vb; vab; vabb; vaba; vabbb; : : : are polynomials inthe parameters ai ; bi of the splitting scheme (5). Inparticular, one gets va D PsC1

iD1 ai , vb D PsiD1 bi ,

vab D 12

� PsiD1 bi

PijD1 aj . The order conditions

then read va D vb D 1 and vab D vabb Dvaba D � � � D 0 up to the order considered. Toachieve order r D 1; 2; 3; : : : ; 10, the numberof conditions to be fulfilled is

PrjD1 nj , where

nj D 2; 1; 2; 3; 6; 9; 18; 30; 56; 99. This number issmaller for r > 3 when dealing with second-orderODEs of the form y00 D g.y/ when they are rewrittenas (1) [1].

For time-symmetric methods, the order conditionsat even orders are automatically satisfied, which leadsto n1 C n3 C � � � C n2k�1 order conditions to achieveorder r D 2k. For instance, n1 C n3 D 4 conditionsneed to be fulfilled for a symmetric method (5–6) to beof order 4.

Splitting and CompositionMethodsWhen the original system (1) is split in m > 2 parts,higher-order schemes can be obtained by consideringa composition of the basic first-order splitting method

(3) and its adjoint ��h D '

Œ1�

h ı � � � ı 'Œm�1�h ı 'Œm�h . More

specifically, compositions of the general form

h D ��2sh

ı �˛2s�1h ı � � � ı ��2h

ı �˛1h; (10)

can be considered with appropriately chosen coeffi-cients .˛1; : : : ; ˛2s/ 2 R

2s so as to achieve a prescribedorder of approximation.

In the particular case when system (1) is split inm D 2 parts so that �h D '

Œb�

h ı 'Œa�h , method (10)reduces to (5) with a1 D ˛1 and

bj D ˛2j�1 C ˛2j ; ajC1 D ˛2j C ˛2jC1;

for j D 1; : : : ; s; (11)

where ˛2sC1 D 0. In that case, the coefficients ai andbi are such that

sC1XiD1

ai DsXiD1

bi : (12)


S

Conversely, any splitting method (5) satisfying (12) can

be written in the form (10) with �h D 'Œb�

h ı 'Œa�h .Moreover, compositions of the form (10) make

sense for an arbitrary basic first-order integrator �h(and its adjoint ��

h ) of the original system (1). Obvi-ously, if the coefficients ˛j of a composition method(10) are such that h is of order r for arbitrary basicintegrators �h of (1), then the splitting method (5) with(11) is also of order r . Actually, as shown in [6], theintegrator (5) is of order r for ODEs of the form (1)with f D f Œa� C f Œb� if and only if the integrator (10)(with coefficients ˛j obtained from (11)) is of order rfor arbitrary first-order integrators �h.

This close relationship allows one to establish in anelementary way a defining feature of splitting methods(5) of order r � 3: at least one ai and one biare necessarily negative [1]. In other words, splittingschemes of order r � 3 always involve backwardfractional time steps.

Preserving PropertiesAssume that the individual flows 'Œi�h share with the ex-act flow 'h some defining property which is preservedby composition. Then it is clear that any compositionof the form (5) and (10) with �h given by (3) alsopossesses this property. Examples of such features aresymplecticity, unitarity, volume preservation, conser-vation of first integrals, etc. [7]. In this sense, splittingmethods form an important class of geometric numer-ical integrators [2]. Repeated application of the BCHformula can be used (see (9)) to show that there existsa modified (formal) differential equation

Qx0 D fh. Qx/ f . Qx/C hf2. Qx/C h2f3. Qx/C � � � ;Qx.0/ D x0; (13)

associated to any splitting method h such that thenumerical solution xn D h.xn�1/ (n D 1; 2; : : :)satisfies xn D Qx.nh/ for the exact solution Qx.t/of (13). An important observation is that the vectorfields fk in (13) belong to the Lie algebra generatedby f Œ1�; : : : ; f Œm�. In the particular case of autonomousHamiltonian systems, if f Œi� are Hamiltonian, theneach fk is also Hamiltonian. Then one may studythe long-time behavior of the numerical integrator byanalyzing the solutions of (13) viewed as a smallperturbation of the original system (1) and obtain

rigorous statements with techniques of backward erroranalysis [2].

Further ExtensionsSeveral extensions can be considered to reduce thenumber of stages necessary to achieve a given orderand get more efficient methods. One of them is theuse of a processor or corrector. The idea consists inenhancing an integrator h (the kernel) with a map �h(the processor) as O h D �h ı h ı ��1

h . Then, aftern steps, one has O nh D �h ı nh ı ��1

h , and so onlythe cost of h is relevant. The simplest example of aprocessed integrator is provided by the Stormer–Verletmethod (4). In that case, h D �h D '

Œb�

h ı 'Œa�h and

�h D 'Œa�

h=2. The use of processing allows one to getmethods with fewer stages in the kernel and smallererror terms than standard compositions [1].

The second extension uses the flows correspondingto other vector fields in addition to F Œa� and F Œb�.For instance, one could consider methods (5) suchthat, in addition to 'Œa�h and 'Œb�h , use the h-flow '

Œabb�

h

of the vector field F Œabb� when its computation isstraightforward. This happens, for instance, for second-order ODEs y00 D g.y/ [1, 7].

Splitting is particularly appropriate when kf Œa�k �kf Œb�k in (1). Introducing a small parameter ", we canwrite x0 D "f Œa�.x/ C f Œb�.x/, so that the error ofscheme (5) is O."/. Moreover, since in many practi-cal applications " < h, one is mainly interested ineliminating error terms with small powers of " insteadof satisfying all the order conditions. In this way, it ispossible to get more efficient schemes. In addition, theuse of a processor allows one to eliminate the errors oforder "hk for all 1 < k < n and all n [7].

Although only autonomous differential equationshave been considered here, several strategies existfor adapting splitting methods also to nonau-tonomous systems without deteriorating their overallefficiency [1].

Some Good Fourth-Order Splitting MethodsIn the following table, we collect the coefficients ofa few selected fourth-order symmetric methods of theform (5–6). Higher-order and more elaborated schemescan be found in [1, 2, 7] and references therein. Theyare denoted as Xs4, where s indicates the number ofstages. S64 is a general splitting method, whereas SN64

refers to a method tailored for second-order ODEs of

1356 SplittingMethods

the form y00 D g.y/ when they are rewritten as a first-order system (1), and the coefficients ai are associatedto g.y/. Finally, SNI54 is a method especially designedfor problems of the form x0 D "f Œa�.x/ C f Œb�.x/.With s D 3 stages, there is only one solution, S34,given by a1 D b1=2; b1 D 2=.2 � 21=3/. In all cases,the remaining coefficients are fixed by symmetry andconsistency (

Pi ai D P

i bi D 1).

S64 a1 D 0:07920369643119565 b1 D 0:209515106613362

a2 D 0:353172906049774 b2 D �0:143851773179818a3 D �0:04206508035771952

SN64 a1 D 0:08298440641740515 b1 D 0:245298957184271

a2 D 0:396309801498368 b2 D 0:604872665711080

a3 D �0:3905630492234859SNI54a1 D 0:81186273854451628884 b1 D �0:0075869131187744738

a2 D �0:67748039953216912289 b2 D 0:31721827797316981388

Numerical Example: A Perturbed KeplerProblemTo illustrate the performance of the previous splittingmethods, we apply them to the time integration of theperturbed Kepler problem described by the Hamilto-nian

H D 1

2.p21 C p22/� 1

r� "

2r5

�q22 � 2q21

�; (14)

where r Dqq21 C q22 . We take " D 0:001

and integrate the equations of motion q0i D pi ,

p0i D �@H=@qi , i D 1; 2, with initial conditionsq1 D 4=5, q2 D p1 D 0, p2 D p

3=2. Splittingmethods are used with the partition into kineticand potential energy. We measure the two-normerror in the position at tf D 2; 000, .q1; q2/ D.0:318965403761932; 1:15731646810481/, fordifferent time steps and plot the corresponding erroras a function of the number of evaluations for eachmethod in Fig. 1. Notice that although the genericmethod S64 has three more stages than the minimumgiven by S34, this extra cost is greatly compensated bya higher accuracy. On the other hand, since this systemcorresponds to the second-order ODE q00 D g.q/,method SN64 leads to a higher accuracy with thesame computational cost. Finally, SNI54 takes profit ofthe near-integrable character of the Hamiltonian (14)

and the two extra stages to achieve an even higherefficiency. It requires solving the Kepler problemseparately from the perturbation. This requires a moreelaborated algorithm with a slightly increase in thecomputational cost (not reflected in the figure). Resultsprovided by the leapfrog method S2 and the standardfourth-order Runge–Kutta integrator RK4 are alsoincluded for reference.

SplittingMethods for PDEs

In the numerical treatment of evolutionary PDEs ofparabolic or mixed hyperbolic-parabolic type, splittingtime-integration methods are also widely used. In thissetting, the overall evolution operator is formally writ-ten as a sum of evolution operators, typically repre-senting different aspects of a given model. Consideran evolutionary PDE formulated as an abstract Cauchyproblem in a certain function space U � fu W R

D �R ! Rg,

ut D L.u/; u.t0/ D u0; (15)

where L is a spatial partial differential operator. Forinstance,

@

@tu.x; t/ D

dXjD1

@

@xj

dXiD1

ci .x/@

@xiu.x; t/

!

Cf .x; u.x; t//; u.x; t0/ D u0.x/

or in short, L.x; u/ D r � .cru/ C f .u/ correspondsto a diffusion-reaction problem. In that case, it makessense to split the problem into two subequations, cor-responding to the different physical contributions,

ut D La.u/ r � .cru/; ut D Lb.u/ f .u/;(16)

solve numerically each equation in (16), thus givinguŒa�.h/ D '

Œa�

h .u0/, uŒb�.h/ D 'Œb�

h .u0/, respectively,for a time step h, and then compose the operators'Œa�

h , 'Œb�h to construct an approximation to the solu-

tion of (15). Thus, u.h/ 'Œb�

h .'Œa�

h .u0// providesa first-order approximation, whereas the Strang split-ting u.h/ '

Œa�

h=2.'Œb�

h .'Œa�

h=2.u0/// is formally second-order accurate for sufficiently smooth solutions. Inthis way, especially adapted numerical methods canbe used to integrate each subproblem, even in parallel[3, 4].


S

Splitting Methods, Fig. 1Error in the solution.q1.tf /; q2.tf // vs. thenumber of evaluations fordifferent fourth-order splittingmethods (the extra cost in themethod SNI54, designed forperturbed problems, is nottaken into account)

4 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 5.8 6−8

−7

−6

−5

−4

−3

−2

−1

0

LOG10(N. EVALUATIONS)

LOG

10(E

RR

OR

)

S2

RK4S34S64

SN64

SNI54

Systems of hyperbolic conservation laws, such as

ut C f .u/x C g.u/x D 0; u.x; t0/ D u0.x/;

can also be treated with splitting methods, in thiscase, by fixing a step size h and applying a especiallytailored numerical scheme to each scalar conserva-tion law ut C f .u/x D 0 and ut C g.u/x D 0.This is a particular example of dimensional splittingwhere the original problem is approximated by solv-ing one space direction at a time. Early examples ofdimensional splitting are the so-called locally one-dimensional (LOD) methods (such as LOD-backwardEuler and LOD Crank–Nicolson schemes) and al-ternating direction implicit (ADI) methods (e.g., thePeaceman–Rachford algorithm) [4].

Although the formal analysis of splitting methodsin this setting can also be carried out by power se-ries expansions, several fundamental difficulties arise,however. First, nonlinear PDEs in general possesssolutions that exhibit complex behavior in small re-gions of space and time, such as sharp transitions anddiscontinuities. Second, even if the exact solution ofthe original problem is smooth, it might happen thatthe composition defining the splitting method providesnonsmooth approximations. Therefore, it is necessaryto develop sophisticated tools to analyze whether thenumerical solution constructed with a splitting method

leads to the correct solution of the original problem ornot [3].

On the other hand, even if the solution is sufficientlysmooth, applying splitting methods of order higherthan two is not possible for certain problems. Thishappens, in particular, when there is a diffusion termin the equation; since then the presence of negativecoefficients in the method leads to an ill-posed prob-lem. When c D 1 in (16), this order barrier hasbeen circumvented, however, with the use of complex-valued coefficients with positive real parts: the operator'Œa�

zh corresponding to the Laplacian La is still welldefined in a reasonable distribution set for z 2 C,provided that <.z/ � 0.

There exist also relevant problems where high-ordersplitting methods can be safely used as is in the inte-gration of the time-dependent Schrodinger equationiut D � 1

2m u C V.x/u split into kinetic T D

�.2m/�1 and potential V energy operators andwith periodic boundary conditions. In this case,the combination of the Strang splitting in time andthe Fourier collocation in space is quite popular inchemical physics (with the name of split-step Fouriermethod). These schemes have appealing structure-preserving properties, such as unitarity, symplecticity,and time-symmetry [5]. Moreover, it has been shownthat for a method (5) of order r with the splitting intokinetic and potential energy and under relatively mildassumptions on T and V , one has an r th-order error

1358 Stability, Consistency, and Convergence of Numerical Discretizations

bound k nh u0 � u.nh/k � CnhrC1 max0�s�nh ku.s/krin terms of the r th-order Sobolev norm [5].

Cross-References

�Composition Methods�One-Step Methods, Order, Convergence� Symmetric Methods� Symplectic Methods

References

1. Blanes, S., Casas, F., Murua, A.: Splitting and compositionmethods in the numerical integration of differential equa-tions. Bol. Soc. Esp. Mat. Apl. 45, 89–145 (2008)

2. Hairer, E., Lubich, C., Wanner, G.: Geometric NumericalIntegration. Structure-Preserving Algorithms for OrdinaryDifferential Equations, 2nd edn, Springer, Berlin (2006)

3. Holden, H., Karlsen, K.H., Lie, K.A., Risebro, N.H.: Split-ting Methods for Partial Differential Equations with RoughSolutions. European Mathematical Society, Zurich (2010)

4. Hundsdorfer, W., Verwer, J.G.: Numerical Solution ofTime-Dependent Advection-Diffusion-Reaction Equations.Springer, Berlin (2003)

5. Lubich, C.: From Quantum to Classical Molecular Dynamics:Reduced Models and Numerical Analysis. European Mathe-matical Society, Zurich (2008)

6. McLachlan, R.I.: On the numerical integration of ordinarydifferential equations by symmetric composition methods.SIAM J. Numer. Anal. 16, 151–168 (1995)

7. McLachlan, R.I., Quispel, R.: Splitting methods. Acta Nu-mer. 11, 341–434 (2002)

Stability, Consistency, and Convergenceof Numerical Discretizations

Douglas N. ArnoldSchool of Mathematics, University of Minnesota,Minneapolis, MN, USA

Overview

A problem in differential equations can rarely besolved analytically, and so often is discretized,resulting in a discrete problem which can be solvedin a finite sequence of algebraic operations, efficientlyimplementable on a computer. The error in a

discretization is the difference between the solutionof the original problem and the solution of the discreteproblem, which must be defined so that the differencemakes sense and can be quantified. Consistency ofa discretization refers to a quantitative measure ofthe extent to which the exact solution satisfies thediscrete problem. Stability of a discretization refers toa quantitative measure of the well-posedness of thediscrete problem. A fundamental result in numericalanalysis is that the error of a discretization may bebounded in terms of its consistency and stability.

A Framework for Assessing DiscretizationsMany different approaches are used to discretizedifferential equations: finite differences, finite ele-ments, spectral methods, integral equation approaches,etc. Despite the diversity of methods, fundamentalconcepts such as error, consistency, and stability arerelevant to all of them. Here, we describe a frameworkgeneral enough to encompass all these methods,although we do restrict to linear problems to avoidmany complications. To understand the definitions, itis good to keep some concrete examples in mind, andso we start with two of these.

A Finite Difference MethodAs a first example, consider the solution of the Poissonequation, u D f , on a domain ˝ � R

2, subject tothe Dirichlet boundary condition u D 0 on @˝ . Onepossible discretization is a finite difference method,which we describe in the case˝ D .0; 1/�.0; 1/ is theunit square. Making reference to Fig. 1, let h D 1=n,n > 1 integer, be the grid size, and define the griddomain, ˝h D f .lh;mh/ j 0 < l;m < n g, as the setof grid points in ˝ . The nearest neighbors of a gridpoint p D .p1; p2/ are the four grid points pW D.p1 � h; p2/, pE D .p1 C h; p2/, pS D .p1; p2 � h/,and pN D .p1; p2 C h/. The grid points which donot themselves belong to ˝ , but which have a nearestneighbor in ˝ constitute the grid boundary, @˝h, andwe set N

h D ˝h[@˝h. Now let v W Nh ! R be a grid

function. Its five-point Laplacian hv is defined by

hv.p/ D v.pE/Cv.pW /Cv.pS/Cv.pN /�4v.p/h2

;

p 2 ˝h:

The finite difference discretization then seeks uh WNh ! R satisfying

http://dx.doi.org/10.1007/978-3-540-70529-1_103

http://dx.doi.org/10.1007/978-3-540-70529-1_130

http://dx.doi.org/10.1007/978-3-540-70529-1_151

http://dx.doi.org/10.1007/978-3-540-70529-1_152

Stability, Consistency, and Convergence of Numerical Discretizations 1359

S

Stability, Consistency, andConvergence of NumericalDiscretizations, Fig. 1 Thegrid domain N

h consists ofthe points in ˝h, marked withsolid dots, and in @˝h,marked with hollow dots. Onthe right is the stencil of thefive-point Laplacian, whichconsists of a grid point p andits four nearest neighbors

huh.p/ D f .p/; p 2 ˝h; uh.p/ D 0; p 2 @˝h:

If we regard as unknowns, the N D .n � 1/2 valuesuh.p/ for p 2 ˝h, this gives us a systems of N linearequations in N unknowns which may be solved veryefficiently.

A Finite Element MethodA second example of a discretization is provided bya finite element solution of the same problem. In thiscase we assume that ˝ is a polygon furnished witha triangulation Th, such as pictured in Fig. 2. Thefinite element method seeks a function uh W ˝ !R which is continuous and piecewise linear with re-spect to the mesh and vanishing on @˝ , and whichsatisfies

�Z˝

ruh � rv dx DZ˝

f v dx;

for all test functions v which are themselves continuousand piecewise linear with respect to the mesh andvanish on @˝ . If we choose a basis for this set ofspace of test functions, then the computation of uhmay be reduced to an efficiently solvable system ofN linear equations in N unknowns, where, in thiscase, N is the number of interior vertices in thetriangulation.

DiscretizationWe may treat both these examples, and many otherdiscretizations, in a common framework. We regard

the discrete operator as a linear map Lh from a vectorspace Vh, called the discrete solution space, to a secondvector space Wh, called the discrete data space. In thecase of the finite difference operator, the discrete solu-tion space is the space of mesh functions on N

h whichvanish on @˝h, the discrete data space is the spaceof mesh functions on ˝h, and the discrete operatorLh D h, the five-point Laplacian. In the case of thefinite element method, Vh is the space of continuouspiecewise linear functions with respect to the giventriangulation that vanish on @˝ , and Wh D V �

h , thedual space of Vh. The operator Lh is given by

.Lhw/.v/ D �Z˝

rw � rv dx; w; v 2 Vh:

For the finite difference method, we define the discretedata fh 2 Wh by fh D f j˝h , while for the finiteelement method fh 2 Wh is given by fh.v/ DRf v dx. In both cases, the discrete solution uh 2 Vh

is found by solving the discrete equation

Lhuh D fh: (1)

Of course, a minimal requirement on the discretizationis that the finite dimensional linear system (1) hasa unique solution, i.e., that the associated matrix isinvertible (so Vh and Wh must have the same dimen-sion). Then, the discrete solution uh is well-defined.The primary goal of numerical analysis is to ensurethat the discrete solution is a good approximation ofthe true solution u in an appropriate sense.


Stability, Consistency, and Convergence of Numerical Discretizations, Fig. 2 A finite element mesh of the domain ˝. Thesolution is sought as a piecewise linear function with respect to the mesh

Representative and ErrorSince we are interested in the difference between uand uh, we must bring these into a common vectorspace, where the difference makes sense. To this end,we suppose that a representative Uh 2 Vh of u isgiven. The representative is taken to be an element ofVh which, though not practically computable, is a goodapproximation of u. For the finite difference method,a natural choice of representative is the grid functionUh D uj˝h . If we show that the difference Uh � uhis small, we know that the grid values uh.p/ whichdetermine the discrete solution are close to the exactvalues u.p/. For the finite element method, a goodpossibility for Uh is the piecewise linear interpolantof u, that is, Uh is the piecewise linear function thatcoincides with u at each vertex of the triangulation.Another popular possibility is to take Uh to be the bestapproximation of u in Vh in an appropriate norm. Inany case, the quantity Uh � uh, which is the differ-ence between the representative of the true solutionand the discrete solution, defines the error of thediscretization.

At this point we have made our goal more concrete:we wish to ensure that the error, Uh � uh 2 Vh, issmall. To render this quantitative, we need to select anorm on the finite dimensional vector space Vh withwhich to measure the error. The choice of norm isan important aspect of the problem presentation, andan appropriate choice must reflect the goal of thecomputation. For example, in some applications, alarge error at a single point of the domain could becatastrophic, while in others only the average error overthe domain is significant. In yet other cases, derivativesof u are the true quantities of interest. These caseswould lead to different choices of norms. We shalldenote the chosen norm of v 2 Vh by kvkh. Thus, we

now have a quantitative goal for our computation thatthe error kUh � uhkh be sufficiently small.

Consistency and Stability

Consistency ErrorHaving used the representative Uh of the solution todefine the error, we also use it to define a second sortof error, the consistency error, also sometimes calledthe truncation error. The consistency error is definedto be LhUh � fh, which is an element of Wh. NowUh represents the true solution u, so the consistencyerror should be understood as a quantity measuring theextent to which the true solution satisfies the discreteequation (1). Since Lu D f , the consistency errorshould be small if Lh is a good representative of L andfh a good representative of f . In order to relate thenorm of the error to the consistency error, we need anorm on the discrete data space Wh as well. We denotethis norm by kwk0

h for w 2 Wh and so our measure ofthe consistency error is kLhUh � fhk0

h.

StabilityIf a problem in differential equations is well-posed,then, by definition, the solution u depends continuouslyon the data f . On the discrete level, this continuousdependence is called stability. Thus, stability refers tothe continuity of the mapping L�1

h W Wh ! Vh, whichtakes the discrete data fh to the discrete solution uh.Stability is a matter of degree, and an unstable dis-cretization is one for which the modulus of continuityof L�1

h is very large.To illustrate the notion of instability, and to motivate

the quantitative measure of stability we shall introducebelow, we consider a simpler numerical problem than


S

the discretization of a differential equation. Supposewe wish to compute the definite integral

�nC1 DZ 1

0

xnex�1 dx; (2)

for n D 15. Using integration by parts, we obtain asimple recipe to compute the integral in short sequenceof arithmetic operations:

�nC1 D 1 � n�n; n D 1; : : : ; 15;

�1 D 1 � e�1 D 0:632121 : : : : (3)

Now suppose we carry out this computation, beginningwith �1 D 0:632121 (so truncated after six decimalplaces). We then find that �16 D �576; 909, which istruly a massive error, since the correct value is �16 D0:0590175 : : :. If we think of (3) as a discrete solutionoperator (analogous to L�1

h above) taking the data �1to the solution �16, then it is a highly unstable scheme:a perturbation of the data of less than 10�6 leads toa change in the solution of nearly 6 � 105. In fact, it iseasy to see that for (3), a perturbation � in the data leadsto an error of 15Š � � in solution – a huge instability.It is important to note that the numerical computationof the integral (2) is not a difficult numerical problem.It could be easily computed with Simpson’s rule, forexample. The crime here is solving the problem withthe unstable algorithm (3).

Returning to the case of the discretization (1), imag-ine that we perturb the discrete data fh to some Qfh Dfh C �h, resulting in a perturbation of the discretesolution to Quh D L�1

hQfh. Using the norms in Wh and

Vh to measure the perturbations and then computingthe ratio, we obtain

solution perturbation

data perturbationD kQuh � uhkh

k Qfh � fhk0h

D kL�1h �hkh

k�hk0h

:

We define the stability constant C stabh , which is our

quantitative measure of stability, as the maximum valuethis ratio achieves for any perturbation �h of the data.In other words, the stability constant is the norm of theoperator L�1

h :

C stabh D sup

0¤�h2Wh

kL�1h �hkh

k�hk0h

D kL�1h kL.Wh;Vh/:

Relating Consistency, Stability, and Error

The Fundamental Error BoundLet us summarize the ingredients we have introducedin our framework to assess a discretization:• The discrete solution space, Vh, a finite dimensional

vector space, normed by k � kh• The discrete data space, Wh, a finite dimensional

vector space, normed by k � k0h

• The discrete operator, Lh W Vh ! Wh, an invertiblelinear operator

• The discrete data fh 2 Wh

• The discrete solution uh determined by the equationLhuh D fh

• The solution representative Uh 2 Vh• The error Uh � uh 2 Vh• The consistency error LhUh � fh 2 Wh

• The stability constant C stabh D kL�1

h kL.Wh;Vh/With this framework in place, we may prove a rigorouserror bound, stating that the error is bounded by theproduct of the stability constant and the consistencyerror:

kUh � uhkh � C stabh kLhUh � fhk0

h: (4)

The proof is straightforward. Since Lh is invertible,

Uh � uh D L�1h ŒLh.Uh � uh/� D L�1

h .LhUh � Lhuh/

D L�1h .LhUh � fh/:

Taking norms, gives

kUh � uhkh � kL�1h kL.Wh;Vh/kLhUh � fhk0

h;

as claimed.

The Fundamental TheoremA discretization of a differential equation always en-tails a certain amount of error. If the error is not smallenough for the needs of the application, one generallyrefines the discretization, for example, using a finergrid size in a finite difference method or a triangulationwith smaller elements in a finite element method.Thus, we may consider a whole sequence or family ofdiscretizations, corresponding to finer and finer gridsor triangulations or whatever. It is conventional toparametrize these by a positive real number h calledthe discretization parameter. For example, in the finite


difference method, we may use the same h as before,the grid size, and in the finite element method, wecan take h to be the maximal triangle diameter orsomething related to it. We shall call such a familyof discretizations a discretization scheme. The schemeis called convergent if the error norm kUh � uhkhtends to 0 as h tends to 0. Clearly convergence is ahighly desirable property: it means that we can achievewhatever level of accuracy we need, as long as we do afine enough computation. Two more definitions applyto a discretization scheme. The scheme is consistent ifthe consistency error norm kLhUh � fhk0

h tends to 0with h. The scheme is stable if the stability constantC stabh is bounded uniformly in h: C stab

h � C stab forsome number C stab and all h. From the fundamentalerror bound, we immediately obtain what may becalled the fundamental theorem of numerical analysis:a discretization scheme which is consistent and stableis convergent.

Historical PerspectiveConsistency essentially requires that the discrete equa-tions defining the approximate solution are at leastapproximately satisfied by the true solution. This isan evident requirement and has implicitly guided theconstruction of virtually all discretization methods,from the earliest examples. Bounds on the consistencyerror are often not difficult to obtain. For finite differ-ence methods, for example, they may be derived fromTaylor’s theorem, and, for finite element methods, fromsimple approximation theory. Stability is another mat-ter. Its central role was not understood until the mid-twentieth century, and there are still many differentialequations for which it is difficult to devise or to assessstable methods.

That consistency alone is insufficient for the conver-gence of a finite difference method was pointed out ina seminal paper of Courant, Friedrichs, and Lewy [2]in 1928. They considered the one-dimensional waveequation and used a finite difference method, analo-gous to the five-point Laplacian, with a space-time gridof points .jh; lk/ with 0 � j � n, 0 � l � m integersand h; k > 0 giving the spatial and temporal grid size,respectively. It is easy to bound the consistency error byO.h2Ck2/, so setting k D �h for some constant � > 0and letting h tend to 0, one obtains a consistent scheme.However, by comparing the domains of dependenceof the true solution and of the discrete solution on

the initial data, one sees that this method, thoughconsistent, cannot be convergent if � > 1.

Twenty years later, the property of stability of dis-cretizations began to emerge in the work of von Neu-mann and his collaborators. First, in von Neumann’swork with Goldstine on solving systems of linearequations [5], they studied the magnification of round-off error by the repeated algebraic operations involved,somewhat like the simple example (3) of an unsta-ble recursion considered above. A few years later,in a 1950 article with Charney and Fjørtoft [1] onnumerical solution of a convection diffusion equationarising in atmospheric modeling, the authors clearlyhighlighted the importance of what they called com-putational stability of the finite difference equations,and they used Fourier analysis techniques to assessthe stability of their method. This approach developedinto von Neumann stability analysis, still one of themost widely used techniques for determining stabilityof finite difference methods for evolution equations.

During the 1950s, there was a great deal ofstudy of the nature of stability of finite differenceequations for initial value problems, achieving itscapstone in the 1956 survey paper [3] of Lax andRichtmeyer. In that context, they formulated thedefinition of stability given above and proved that, for aconsistent difference approximation, stability ensuredconvergence.

Techniques for Ensuring Stability

Finite Difference MethodsWe first consider an initial value problem, for example,the heat equation or wave equation, discretized by afinite difference method using grid size h and time stepk. The finite difference method advances the solutionfrom some initial time t0 to a terminal time T bya sequence of steps, with the l th step advancing thediscrete solution from time .l � 1/k to time lk. Ateach time level lk, the discrete solution is a spatialgrid function ulh, and so the finite difference methoddefines an operatorG.h; k/ mapping ul�1h to ulh, calledthe amplification matrix. Since the amplification matrixis applied many times in the course of the calculation(m D .T � t0/=k times to be precise, a numberwhich tends to infinity as k tends to 0), the solutionat the final step umh involves a high power of theamplification matrix, namely G.h; k/m, applied to the


S

Stability, Consistency, and Convergence of Numerical Dis-cretizations, Fig. 3 Finite difference solution of the heat equa-tion using (5). Left: initial data. Middle: discrete solution at

t D 0:03 computed with h D 1=20, k D 1=2;000 (stable).Right: same computation with k D 1=1;000 (unstable)

data u0h. Therefore, the stability constant will dependon a bound for kG.h; k/mk. Usually this can only beobtained by showing that kG.h; k/k � 1 or, at most,kG.h; k/k � 1CO.k/. As a simple example, we mayconsider an initial value problem for the heat equationwith homogeneous boundary conditions on the unitsquare:

@u

@tD u; x 2 ˝; 0 < t � T;

u.x; t/ D 0; x 2 @˝; 0 < t � T;

u.x; 0/ D u0.x/; x 2 ˝;

which we discretize with the five-point Laplacian andforward differences in time:

ul .p/� ul�1.p/k

D hul�1.p/; p 2 ˝h;

0 < l � m; (5)

ul .p/ D 0; p 2 @˝h; 0 < l � m;

u0.p/ D u0.p/; p 2 ˝h: (6)

In this case the norm condition on the amplificationmatrix kG.h; k/k � 1 holds if 4k � h2, but nototherwise, and, indeed, it can be shown that this dis-cretization scheme is stable, if and only if that con-dition is satisfied. Figure 3 illustrates the tremendousdifference between a stable and unstable choice of timestep.

Several methods are used to bound the norm of theamplification matrix. If an L1 norm is chosen, onecan often use a discrete maximum principle based onthe structure of the matrix. If an L2 norm is chosen,then Fourier analysis may be used if the problemhas constant coefficients and simple enough boundaryconditions. In other circumstances, more sophisticatedmatrix or eigenvalue analysis is used.

For time-independent PDEs, such as the Poissonequation, the requirement is to show that the inverseof the discretization operator is bounded uniformly inthe grid size h. Similar techniques as for the time-dependent problems are applied.

Galerkin MethodsGalerkin methods, of which finite element methods arean important case, treat a problem which can be putinto the form: find u 2 V such that B.u; v/ D F.v/ forall v 2 V . Here, V is a Hilbert space, B W V � V ! R

is a bounded bilinear form, and F 2 V �, the dualspace of V . (Many generalizations are possible, e.g.,to the case where B acts on two different Hilbertspaces or the case of Banach spaces.) This problemis equivalent to a problem in operator form, find usuch Lu D F , where the operator L W V ! V �is defined by Lu.v/ D B.u; v/. An example is theDirichlet problem for the Poisson equation considered

earlier. Then, V D VH1.˝/, B.u; v/ D R˝

ru � rv dx,and F.v/ D R

˝f v dx. The operator is L D � W

VH1.˝/ ! VH1.˝/�.A Galerkin method is a discretization which seeks

uh in a subspace Vh of V satisfying B.uh; v/ D F.v/

for all v 2 Vh. The finite element method discussed


Stability, Consistency, and Convergence of Numerical Dis-cretizations, Fig. 4 Approximation of the problem (7), withu D cos�x shown on left and � D u0 on the right. The exactsolution is shown in blue, and the stable finite element method,using piecewise linears for � and piecewise constants for u, is

shown in green (in the right plot, the blue curve essentiallycoincides with the green curve, and so is not visible). An unstablefinite element method, using piecewise quadratics for � , is shownin red

above took Vh to be the subspace of continuous piece-wise linears. If the bilinear form B is coercive in thesense that there exists a constant � > 0 for which

B.v; v/ � �kvk2V ; v 2 V;

then stability of the Galerkin method with respect tothe V norm is automatic. No matter how the subspaceVh is chosen, the stability constant is bounded by 1=� .If the bilinear form is not coercive (or if we considera norm other than the norm in which the bilinearform is coercive), then finding stable subspaces forGalerkin’s method may be quite difficult. As a verysimple example, consider a problem on the unit intervalI D .0; 1/, to find .�; u/ 2 H1.I / � L2.I / suchthat

Z 1

0

� dx CZ 1

0

0u dx CZ 1

0

� 0v dx DZ 1

0

f v dx;

.; v/ 2 H1.I / � L2.I /: (7)

This is a weak formulation of system � D u0, � 0 D f ,with Dirichlet boundary conditions (which arise fromthis weak formulation as natural boundary conditions),so this is another form of the Dirichlet problem forPoisson’s equation u00 D f on I , u.0/ D u.1/ D 0. Inhigher dimensions, there are circumstances where sucha first-order formulation is preferable to a standardsecond-order form. This problem can be discretized bya Galerkin method, based on subspaces Sh � H1.I /

and Wh � L2.I /. However, the choice of subspacesis delicate, even in this one-dimensional context.

If we partition I into subintervals and choose Shand Wh both to be the space of continuous piecewiselinears, then the resulting matrix problem is singular,so the method is unusable. If we choose Sh tocontinuous piecewise linears, and Wh to be piecewiseconstants, we obtain a stable method. But if we chooseSh to contain all continuous piecewise quadraticfunctions and retain the space of piecewise constantsfor Wh, we obtain an unstable scheme. The stableand unstable methods can be compared in Fig. 4.For the same problem of the Poisson equation infirst-order form, but in more than one dimension,the first stable elements were discovered in 1975[4].

References

1. Charney, J.G., Fjørtoft, R., von Neumann, J.: Numericalintegration of the barotropic vorticity equation. Tellus 2,237–254 (1950)

2. Courant, R., Friedrichs, K., Lewy, H.: Uber die partiellenDifferenzengleichungen der mathematischen Physik. Math.Ann. 100(1), 32–74 (1928)

3. Lax, P.D., Richtmyer, R.D.: Survey of the stability of linearfinite difference equations. Commun. Pure Appl. Math. 9(2),267–293 (1956)

4. Raviart, P.A., Thomas, J.M.: A mixed finite element methodfor 2nd order elliptic problems. In: Mathematical Aspectsof Finite Element Methods. Proceedings of the Conference,Consiglio Naz. delle Ricerche (C.N.R.), Rome, 1975. Volume606 of Lecture Notes in Mathematics, pp. 292–315. Springer,Berlin (1977)

5. von Neumann, J., Goldstine, H.H.: Numerical inverting ofmatrices of high order. Bull. Am. Math. Soc. 53, 1021–1099(1947)

Statistical Methods for Uncertainty Quantification for Linear Inverse Problems 1365

S

Statistical Methods for UncertaintyQuantification for Linear InverseProblems

Luis TenorioMathematical and Computer Sciences, ColoradoSchool of Mines, Golden, CO, USA

To solve an inverse problem means to recover an un-known object from indirect noisy observations. As anillustration, consider an idealized example of the blur-ring of a one-dimensional signal, f .x/, by a measuringinstrument. Assume the function is parametrized sothat x 2 Œ0; 1� and that the actual data can be modeledas noisy observations of a blurred version of f . Wemay model the blurring as a convolution with a ker-nel, K.x/, determined by the instrument. The forwardoperator maps f to the blurred function � given by�.x/ D R 1

0 K.x � t/f .t/ dt (i.e., a Fredholm integralequation of the first kind.) The statistical model forthe data is then y.x/ D �.x/ C ".x/; where ".x/is measurement noise. The inverse problem consistsof recovering f from finitely many measurementsy.x1/,. . . ,y.xn/. However, inverse problems are usu-ally ill-posed (e.g., the estimates may be very sensitiveto small perturbations of the data) and deblurring is onesuch example. A regularization method is required tosolve the problem. An introduction to regularization ofinverse problems can be found in [11] and more generalreferences are [10, 27].

Since the observations y.xi / are subject to system-atic errors (e.g., discretization) as well as measure-ment errors that will be modeled as random variables,the solution of an inverse problem should include asummary of the statistical characteristics of the inver-sion estimate such as means, standard deviations, bias,mean squared errors, and confidence sets. However, theselection of proper statistical methods to assess esti-mators depends, of course, on the class of estimators,which in turn is determined by the type of inverseproblem and chosen regularization method. Here weuse a general framework that encompasses severaldifferent and widely used approaches.

We consider the problem of assessing the statisticsof solutions of linear inverse problems whose data aremodeled as yi D Ki Œf � C "i .i D 1; : : : ; n/; wherethe functions f belong to a linear space H , each Ki

is a continuous linear operator defined on H , and theerrors "i are random variables. Since the errors arerandom, an estimate Of of f is a random variable takingvalues in H . Given the finite amount of data, we canonly hope to recover components of f that admit afinite-dimensional parametrization. Such parametriza-tions also help us avoid defining probability measuresin function spaces. For example, we can discretizethe operators and the function so that the estimate Of

is a vector in Rm. Alternatively, one may be able to

use a finite-dimensional parametrization such as f DPmkD1 ak k , where k are fixed functions defined on

H . This time the random variable is the estimate Oa ofthe vector of coefficients a D .ak/. In either case theproblem of finding an estimate of a function reduces toa finite-dimensional linear algebra problem.

Example 1 Consider the inverse problem for a Fred-holm integral equation:

yi Dy.xi /DZ 1

0

K.xi�t/f .t/ dtC"i ; .iD1; : : : ; n/:

To discretize the integral, we can usem equally spacedpoints ti in Œ0; 1� and define t 0j D .tj C tj�1/=2. Then,

�.xi /DZ 1

0

K.xi�y/f .y/ dy 1

m

m�1XjD1

K.xi�t 0j / f .t 0j /:

Hence we have the approximation � D .�.x1/; : : : ;

�.xn//t Kf withKij D K.xi � t 0j / and fi D f .t 0i /.

Writing the discretization error as ı D � � Kf , wearrive at the following model for the data vector y :

y D Kf C ı C ": (1)

Asm ! 1 the approximation of the integral improvesbut the matrix K becomes more ill-conditioned. Toregularize the problem, we define an estimate Of of f

using penalized least squares:

Of D arg ming2Rm ky � Kgk2 C �2kDgk2

D .K tK C �2DtD/�1K t y Ly;

where � > 0 is a fixed regularization parameter andD is a chosen matrix (e.g., a matrix that computesdiscrete derivatives). This regularization addresses theill-conditioning of the matrix K ; it is a way of adding

1366 Statistical Methods for Uncertainty Quantification for Linear Inverse Problems

the prior information that we expect kDf k to be small.The case D D I is known as (discrete) Tikhonov-Phillips regularization. Note that we may write Of .xi /as a linear function of y: Of .xi / D etiLy where fei g isthe standard orthonormal basis in R

m. �

In the next two examples, we assume that H is aHilbert space, and each Ki W H ! R is a bounded lin-ear operator (and thus continuous). We write KŒf � D.K1Œf �; : : : ;KnŒf �/

t .

Example 2 Since K.H/ � Rn is finite dimensional,

it follows that K is compact as is its adjoint K� WRn ! H , and K�K is a self-adjoint compact operator

on H . In addition, there is a collection of orthonormalfunctions f �k g inH , orthonormal vectors f vk g in R

n,and a positive, nonincreasing sequence .�k/ such that[22]: (a) f �k g is an orthonormal basis for Null.K/?;(b) f vk g is an orthonormal basis for the closure ofRange.K/ in R

n; and (c) KŒ�k� D �kvk and K�Œvk� D�k�k . Write f D f0 C f1, with f0 2 Null.K/ andf1 2 Null.K/?. Then, there are constants ak suchthat f1 D Pn

kD1 ak�k . The data do not provide anyinformation about f0 so without any other informationwe have no way of estimating such component off . This introduces a systematic bias. The problem ofestimating f is thus reduced to estimating f1, that is,the coefficients ak . In fact, we may transform the datato hy; vki D �kak C h"; vki and use them to estimatethe vector of coefficients a D .ak/; the transformeddata based on this sequence define a sequence spacemodel [7, 18]. We may also rewrite the data as y DV a C "; where V D . �1v1 � � ��nvn /. An estimate off is obtained using a penalized least-squares estimateof a:

Oa D arg minb2Rn ky � V bk2 C �2kbk2:

This leads again to an estimate that is linear in y; writeit as Oa D Ly for some matrix L. The estimate of f .x/is then similar to that in Example 1:

Of .x/ D �.x/t Oa D �.x/tLy; (2)

with �.x/ D .�1.x/; : : : ; �n.x//t . �

If the goal is to estimate pointwise values of f 2H , then the Hilbert space H needs to be definedappropriately. For example, if H D L2.Œ0; 1�/, thenpointwise values of f are not well defined. The fol-lowing example introduces spaces where evaluation

at a point (i.e., f ! f .x/) is a continuous linearfunctional.

Example 3 Let I D Œ0; 1�. Let Wm.I / be the linearspace of real-valued functions on I such that f hasm�1 continuous derivatives on I , f .m�1/ is absolutelycontinuous on I (so f .m/ exists almost everywhere onI ), and f .m/ 2 L2.I /. The space Wm.I / is a Hilbertspace with inner product

hf; gi Dm�1XkD0

f .k/.0/ g.k/.0/CZI

f .m/.x/ g.m/.x/ dx

and has the following properties [2, 29]: (a) For everyx 2 I , there is a function �x 2 Wm.I / such thatthe linear functional f ! f .x/ is continuous onWm.I / and given by f ! h�x; f i. The functionR W I � I ! R, R.x; y/ D h�x; �yi is calleda reproducing kernel of the Hilbert space, and (b)Wm.I / D Nm�1 ˚ Hm, where Nm�1 is the spaceof polynomials of degree at most m � 1 and Hm Df f 2 Wm.I / W f .k/.0/ D 0 for k D 0; : : : ; m � 1 g:Since the space Wm.I / satisfies (a), it is called areproducing kernel Hilbert space (RKHS). To controlthe smoothness of the Tikhonov estimate, we put apenalty on the derivative of f1, which is the projectionf1 D PHf onto Hm. To write the penalized sumof squares, we use the fact that each functional Ki WWm.I / ! R is continuous and thus Kif D h�i ; f ifor some function �i 2 Wm.I /. We can then write

k y�Kf k2 C �2ZI

.f.m/1 .x//2 dx

DnX

jD1. yi � h�i ; f i /2 C �2kPHf k2: (3)

Define �k.x/ D xk�1 for k D 1; : : : ; m and �k DPH�k�m for k D m C 1; : : : ; m C n. Then f DP

k ak�k C ı, where ı belongs to the orthogonalcomplement of the span of f�kg. It can be shown thatthe minimizer Of of (3) is again of the form (2) [29]. Toestimate a we rewrite (3) as a function of a and use thefollowing estimate:

Oa D arg minb

k y � Xb k2 C �2btHPbH ;

where X is the matrix of inner products h�i ; �j i, Pij DhPH�i ; PH�j i and aH D .amC1; : : : ; amCn/t . �


S

These examples describe three different frameworkswhere the functional estimation is reduced to a finite-dimensional penalized least-squares problem. Theyserve to motivate the framework we will use inthe statistical analysis. We will focus on frequentiststatistical methods. Bayesian methods for inverseproblems are discussed in [17]; [1,23] provide a tutorialcomparison of frequentist and Bayesian procedures forinverse problems.

Consider first the simpler case of general linearregression: the n � 1 data vector is modeled as y DKa C ", where K is an n�m matrix, n > m, K tK isnon-singular and " is a random vector with mean zeroand covariance matrix �2I . The least-squares estimateof a is

Oa D arg minb

ky � Kbk2 D .K tK /�1K ty; (4)

and it has the following properties: Its expected valueis E. Oa/ D a regardless of the true value a; that is, Oa isan unbiased estimator of a. The covariance matrix of Oais Var. Oa/ �2.K tK /�1. An unbiased estimator of �2

is O�2 D ky�K Oak2=.n�m/:Note that the denominatoris the difference between the number of observations;it is a kind of “effective number of observations.” It canbe shown thatm D tr.H /, where H D K .K tK /�1K t

is the hat matrix; it is the matrix defined by H y Oy D K Oa. The degrees of freedom (dof) of Oy is definedas the sum of the covariances of .K Oa/i with yi dividedby �2 [24]. For linear regression we have dof.K Oa/ Dm D tr.H /. Hence we may write O�2 as the residualsum of squares normalized by the effective number ofobservations:

O�2 D ky � Oyk2n � dof. Oy/ D ky � Oyk2

tr.I � H /: (5)

We now return to ill-posed inverse problems anddefine a general framework motivated by Examples 1–2 that is similar to general linear regression. We assumethat the data vector has a representation of the formy D KŒf � C " D Ka C ı C "; where K is a linearoperator H ! R

n, K is an n � m matrix, ı is a fixedunknown vector (e.g., discretization error), and " isa random vector of mean zero and covariance matrix�2I . We also assume that there is an n � 1 vector a

and a vector function � such that f .x/ D �.x/ta forall x. The vector a is estimated using penalized leastsquares:

Oa D arg minb

ky � Kbk2 C �2btSb

D .K tK C �2S /�1K ty ; (6)

where S is a symmetric non-negative matrix and � > 0is a fixed regularization parameter. The estimate of fis defined as Of .x/ D �.x/t Oa:

Bias, Variance, and MSEFor a fixed regularization parameter �, the estimatorOf .x/ is linear in y, and therefore its mean, bias,

variance, and mean squared error can be determinedusing only knowledge of the first two moments of thedistribution of the noise vector ". Using (6) we find themean, bias, and variance of Of .x/:

E. Of .x/ / D �.x/tG��1K tKŒf �;

Bias. Of .x/ / D �.x/tG��1K tKŒf � � f .x/

Var. Of .x/ / D �2kKG� �.x/k2;

where G� D .K tK C �2S /�1. Hence, unlike theleast-squares estimate of a (4), the penalized least-squares estimate (6) is biased even when ı D 0. Thisbias introduces a bias in the estimates of f . In terms ofa and ı, this bias is

Bias. Of .x/ / D E. Of .x/ / � f .x/D �.x/tB�a C �.x/tG�K

tı; (7)

where B� D ��2G�S . Prior information about a

should be used to choose the matrix S so that kB�akis small. Note that similar formulas can be derivedfor correlated noise provided the covariance matrixis known. Also, analogous closed formulas can bederived for estimates of linear functionals of f .

The mean squared error (MSE) can be used toinclude the bias and variance in the uncertainty eval-uation of Of .x/; it is defined as the expected value of. Of .x/�f .x//2, which is equivalent to the squared biasplus the variance:

MSE. Of .x/ / D Bias. Of .x/ /2 C Var. Of .x/ /:

The integrated mean squared error of Of is

IMSE. Of / D E

Zj Of .x/ � f .x/ j2 dx

D Bias. Oa/tF Bias. Oa/Ctr.F G�KtKG� /;


where F D R�.x/�.x/t dx.

The bias component of the MSE is the most difficultto assess as it depends on the unknown f (or a and ı),but, depending on the available prior information, someinequalities can be derived [20, 26].

Example 4 If H is a Hilbert space and the functionalsKi are bounded, then there are function �i 2 H suchthat Ki Œf � D h�i ; f i, and we may write

EŒ Of .x/ � D hXi

ai .x/�i ; f i D hAx ; f i;

where the function Ax.y/ D Pi ai .x/�i .y/ is called

the Backus-Gilbert averaging kernel for Of at x [3].In particular, since we would like to have f .x/ DhAx ; f i, we would like Ax to be as concentrated aspossible around x; a plot of the function Ax may pro-vide useful information about the mean of the estimateOf .x/. One may also summarize characteristics of jAxj

such as its center and spread about the center (e.g.,[20]). Heuristically, jAx j should be like a ı-functioncentered at x. This can be formalized in an RKHS H .In this case, there is a function �x 2 H such thatf .x/ D h�x; f i and the bias of Of .x/ can be writtenas Bias. Of .x/ / D hAx � �x; f i and therefore

j Bias. Of .x/ / j � kAx � �x k kf k:

We can guarantee a small bias when Ax is close to �xin H . In actual computations, averaging kernels canbe approximated using splines. A discussion of thistopic as well as information about available softwarefor splines and reproducing kernels can be found in[14, 20]. �

Another bound for the bias follows from (7) via theCauchy-Schwarz and triangle inequalities:

j Bias. Of .x/ / j � kG ��.x/k. �2kS ak C kK tık /:

Plots of kG��.x/k (or kAx � �x k) as a function ofx may provide geometric information (usually con-servative) about the bias. Other measures such as theworst or an average bias can be obtained dependingon the available prior information we have on f orits parametric representation. For example, if a and ı

are known to lie in convex sets S1 and S2, respectively,then we may determine the maximum of j Bias. Of / jsubject to a 2 S1 and ı 2 S2. Or, if the prior

information leads to the modeling of a and ı as randomvariables with means and covariance matrices �a DEa, ˙ a D Var.a/, �ı D Eı and ˙ ı D Var.ı/, thenthe average bias is

EŒBias. Of .x/ / � D �.x/tB��a C �.x/tG�Kt�ı:

Similarly, we can easily derive a bound for the meansquared bias that can be used to put a bound on theaverage MSE.

Since the bias may play a significant factor in theinference (in some geophysical applications the bias isthe dominant component of the MSE), it is important tostudy the residuals of the fit to determine if a significantbias is present. The mean and covariance matrix of theresidual vector r D y � K Oa are

Er D �KBias. Oa/C ı D �KB�a C .I � H �/ı (8)

Var.r/ D �2.I � H �/2; (9)

where H � D KG�Kt is the hat matrix. Equation (8)

shows that if there is a significant bias, then we maysee a trend in the residuals. From (8) we see thatthe residuals are correlated and heteroscedastic (i.e.,Var.ri / depends on i ) even if the bias is zero, whichcomplicates the interpretation of the plots. To stabilizethe variance, it is better to plot residuals that have beencorrected for heteroscedasticity, for example, r 0

i Dri=.1 � .H�/ii/.

Confidence IntervalsIn addition to the mean and variance of Of .x/, wemay construct confidence intervals that are expected tocontain E. Of .x// with some prescribed probability. Wenow assume that the noise is Gaussian, " � N.0; �2I/.We use ˚ to denote the cumulative distribution func-tion of the standard Gaussian N.0; 1/ and write z˛ D˚�1.1 � ˛/.

Since Of .x/ is a biased estimate of f .x/, we canonly construct confidence intervals for E Of .x/. Weshould therefore interpret the intervals with cautionas they may be incorrectly centered if the bias issignificant.

Under the Gaussianity assumption, a confidenceinterval I˛.x; �/ for E Of .x/ of coverage 1 � ˛ is

I˛.x; �/ D Of .x/˙ z˛=2 �kKG� �.x/k:


S

That is, for each x the probability that I˛.x; �/ con-tains E Of .x/ is 1 � ˛. If we have prior information tofind an upper bound for the bias, jBias. Of .x//j � B.x/,then a confidence interval for f .x/ with coverage atleast 1 � ˛ is Of .x/˙ .z˛=2 �kKG� �.x/k C B.x//:

If the goal is to detect structure by studying theconfidence intervals I˛.x; �/ for a range of values ofx, then it is advisable to correct for the total number ofintervals considered so as to control the rate of incor-rect detections. One way to do this is by constructing1 � ˛ confidence intervals of the form I˛.x; �ˇ/ withsimultaneous coverage for all x in some closed set S .This requires finding a constant ˇ > 0 such that

PŒE Of .x/ 2 I˛.x; �ˇ/;8x 2 S � � 1� ˛;

which is equivalent to

P

�supx2S

jZ tV .x/j � ˇ

� ˛;

where V .x/ D KG��.x/=kKG��.x/k and Z1;

: : : ; Zn are independent N.0; 1/. We can use resultsregarding the tail behavior of maxima of Gaussianprocesses to find an appropriate value of ˇ. Forexample, for the case when x 2 Œa; b� [19] (see also[25]) shows that for ˇ large

P

�supx2S

jZ tV .x/j � ˇ

v

�e�ˇ2=2 C 2.1�˚.ˇ//;

where v D R ba kV 0.x/kdx. It is not difficult to find

a root of this nonlinear equation; the only potentialproblem may be computing v, but even an upper boundfor it leads to intervals with simultaneous coverage atleast 1 � ˛. Similar results can be derived for the casewhen S is a subset of R2 or R3 [25].

An alternative approach is to use methods basedon controlling the false discovery rate to correct forthe interval coverage after the pointwise confidenceintervals have been selected [4].

Estimating � and �

The formulas for the bias, variance, and confidenceintervals described so far require knowledge of � anda selection of � that is independent of the data y.If y is also used to estimate � or choose �, thenOf .x/ is no longer linear in y and closed formulas

for the moments, bias, or confidence intervals are

not available. Still, the formulas derived above with“reasonable” estimates O� and O� in place of � and �are approximately valid. This depends of course on theclass of possible functions f , the noise distribution,the signal-to-noise ratio, and the ill-posedness of theproblem. We recommend conducting realistic simula-tion studies to understand the actual performance of theestimates for a particular problem.

Generalized cross-validation (GCV) methods to se-lect � have proved useful in applications and theoreti-cal studies. A discussion of these methods can be foundin [13, 14, 29, 30]. We now summarize a few methodsfor obtaining an estimate of � .

The estimate of �2 given by (5) could be readilyused for, once again, dof.K Oa/ D tr.H �/, with thecorresponding hat matrix H � (provided ı D 0).However, because of the bias of K Oa and the fixed errorı, it is sometimes better to estimate � by consideringthe data as noisy observations of � D Ey D KŒf � – inwhich case we may assume ı D 0; that is, yi D �iC"i .This approach is natural as �2 is the variance of theerrors, "i , in the observations of �i .

To estimate the variance of "i , we need to removethe trend �i . This trend may be seen as the values ofa function �.x/: �i D �.xi /. A variety of nonpara-metric regression methods can be used to estimate thefunction � so it can be removed, and an estimate ofthe noise variance can be obtained (e.g., [15, 16]). Ifthe function can be assumed to be reasonably smooth,then we can use the framework described in 3 withKi Œ�� D �.xi / and a penalty in the second derivativeof �. In this case O�.x/ D P

ai�i .x/ D at�.x/ iscalled a spline smoothing estimate because it is a finitelinear combination of spline functions �i [15, 29]. Theestimate of �2 defined by (5) with the correspondinghat matrix H � was proposed by [28]. Using (8) and (9)we find that the expected value of the residual sum ofsquares is

Eky � Oyk2 D �2 trŒ .I � H �/2 �C atB�K

tKB�a;

(10)and thus O�2 is not an unbiased estimator of �2 evenif the bias is zero (i.e., B�a D 0, which happenswhen � is linear), but it has been shown to havegood asymptotic properties when � is selected usinggeneralized cross-validation [13,28]. From (10) we seethat a slight modification of O�2 leads to an estimate thatis unbiased when B�a D 0 [5]


O�2B D ky � Oyk2trŒ .I � H �/2 �

:

Simulation studies seem to indicate that this estimatehas a smaller bias for a wider set of values of � [6]. Thisproperty is desirable as � is usually chosen adaptively.

In some cases the effect of the trend can also bereduced using first- or second-order finite differenceswithout having to choose a regularization parameter�. For example, a first-order finite-difference estimateproposed by [21] is

O�2R D 1

2.n� 1/

n�1XiD1.yiC1 � yi /2:

The bias of O�2R is small if the local changes of �.x/are small. In particular, the bias is zero for a lineartrend. Other estimators of � as well performance com-parisons can be found in [5, 6].

Resampling MethodsWe have assumed a Gaussian noise distribution forthe construction of confidence intervals. In addition,we have only considered linear operators and linearestimators. Nonlinear estimators arise even when theoperator K is linear. For example, if � and � areestimated using the same data or if the penalized least-squares estimate of a includes interval constraints (e.g.,positivity), then the estimate Oa is no longer linear iny . In some cases the use of bootstrap (resampling)methods allows us to assess statistical properties whilerelaxing the distributional and linearity assumptions.

The idea is to simulate data y� as follows: thefunction estimate Of is used as a proxy for the unknownfunction f . Noise "� is simulated using a parametricor nonparametric method. In the parametric bootstrap,"� is sampled from the assumed distribution whoseparameters are estimated from the data. For example,if the "i are independent N.0; �2/, then "�

i is sampledfrom N.0; O�2/. In the nonparametric bootstrap, "�

i issampled with replacement from the vector of residualsof the fit. However, as Eqs. (8) and (9) show, even inthe linear case, the residuals have to be corrected tobehave approximately like the true errors. Of course,due to the bias and correlation of the residuals, thesecorrections are often difficult to derive and implement.Using "�

i and Of , one generates simulated data vectorsy�j D KŒ Of � C "�

j . For each such y�j one computes

an estimate Ofj of f following the same procedure

used to obtain Of . The statistics of the sample of Ofjare used as estimates of those of Of . One problemwith this approach is that the bias of Of may lead to apoor estimate of KŒf � and thus to unrealistic simulateddata.

An introduction to bootstrap methods can be foundin [9,12]. For an example of bootstrap methods to con-struct confidence intervals for estimates of a functionbased on smoothing splines, see [31].

References

1. Aguilar, O., Allmaras, M., Bangerth, W., Tenorio, L.: Statis-tics of parameter estimates: a concrete example. SIAM Rev.57, 131 (2015)

2. Aronszajn, N.: Theory of reproducing kernels. Trans. Am.Math. Soc. 89, 337 (1950)

3. Backus, G., Gilbert, F.: Uniqueness in the inversion ofinaccurate gross earth data. Philos. Trans. R. Soc. Lond.A 266, 123 (1970)

4. Benjamini, Y., Yekutieli, D.: False discovery rate–adjustedmultiple confidence intervals for selected parameters. J. Am.Stat. Assoc. 100, 71 (2005)

5. Buckley, M.J., Eagleson, G.K., Silverman, B.W.: The es-timation of residual variance in nonparametric regression.Biometrika 75, 189 (1988)

6. Carter, C.K., Eagleson, G.K.: A comparison of varianceestimators in nonparametric regression. J. R. Stat. Soc. B54, 773 (1992)

7. Cavalier, L.: Nonparametric statistical inverse problems.Inverse Probl. 24, 034004 (2008)

8. Craven, P., Wahba, G.: Smoothing noisy data with splines.Numer. Math. 31, 377 (1979)

9. Davison, A.C., Hinkley, D.V.: Bootstrap Methods andTheir Application. Cambridge University Press, Cambridge(1997)

10. Engl, H., Hanke, M., Neubauer, A.: Regularization of In-verse Problems. Kluwer, Dordrecht (1996)

11. Engl H his chapter12. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap.

Chapman & Hall, New York (1993)13. Gu, C.: Smoothing Spline ANOVA Models. Springer,

Berlin/Heidelberg/New York (2002)14. Gu, C.: Smoothing noisy data via regularization: statistical

perspectives. Inverse Probl. 24, 034002 (2008)15. Green, P.J., Silverman, B.W.: Nonparametric Regression

and Generalized Linear Models: A Roughness Penalty Ap-proach. Chapman & Hall, London (1993)

16. Hardle, W.K., Muller, M., Sperlich, S., Werwatz, A.:Nonparametric and Semiparametric Models. Springer,Berlin/Heidelberg/New York (2004)

17. Kaipio, J., Somersalo, E.: Statistical and ComputationalInverse Problems. Springer, Berlin/Heidelberg/New York(2004)

Step Size Control 1371

S

18. Mair, B., Ruymgaart, F.H.: Statistical estimation in Hilbertscale. SIAM J. Appl. Math. 56, 1424 (1996)

19. Naiman, D.Q.: Conservative confidence bands in curvilinearregression. Ann. Stat. 14, 896 (1986)

20. O’Sullivan, F.: A statistical perspective on ill-posed inverseproblems. Stat. Sci. 1, 502 (1986)

21. Rice, J.: Bandwidth choice for nonparametric regression.Ann. Stat. 12, 1215 (1984)

22. Rudin, W.: Functional Analysis. McGraw-Hill, New York(1973)

23. Stark, P.B., Tenorio, L.: A primer of frequentist andBayesian inference in inverse problems. In: Biegler, L.,et al. (eds.) Computational Methods for Large-Scale InverseProblems and Quantification of Uncertainty, pp. 9–32. Wi-ley, Chichester (2011)

24. Stein, C.: Estimation of the mean of a multivariate normaldistribution. Ann. Stat. 9, 1135 (1981)

25. Sun, J., Loader, C.R.: Simultaneous confidence bands forliner regression and smoothing. Ann. Stat. 22, 1328 (1994)

26. Tenorio, L., Andersson, F., de Hoop, M., Ma, P.: Data anal-ysis tools for uncertainty quantification of inverse problems.Inverse Probl. 29, 045001 (2011)

27. Vogel, C.R.: Computational Methods for Inverse Problems.SIAM, Philadelphia (2002)

28. Wahba, G.: Bayesian “confidence intervals” for the cross-validated smoothing spline. J. R. Stat. Soc. B 45, 133 (1983)

29. Wahba, G.: Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics,vol. 50. SIAM, Philadelphia (1990)

30. Wahba G her chapter31. Wang, Y., Wahba, G.: Bootstrap confidence intervals for

smoothing splines and their comparison to Bayesian con-fidence intervals. J. Stat. Comput. Simul. 51, 263 (1995)

Step Size Control

Gustaf SoderlindCentre for Mathematical Sciences, NumericalAnalysis, Lund University, Lund, Sweden

Introduction

Step size control is used to make a numerical methodthat proceeds in a step-by-step fashion adaptive. Thisincludes time stepping methods for solving initial valueproblems, nonlinear optimization methods, and contin-uation methods for solving nonlinear equations. Theobjective is to increase efficiency, but also includesmanaging the stability of the computation.

This entry focuses exclusively on time steppingadaptivity in initial value problems. Special control

algorithms continually adjust the step size in accor-dance with the local variation of the solution, at-tempting to compute a numerical solution to withina given error tolerance at minimal cost. As a typicalintegration may run over thousands of steps, the taskis ideally suited to proven methods from automaticcontrol.

Assume that the problem to be solved is a dynamicalsystem,

dy

dtD f .y/I y.0/ D y0; (1)

with y.t/ 2 Rm. Without loss of generality, we may

assume that the problem is solved numerically usinga one-step integration procedure, explicit or implicit,written formally as

ynC1 D ˆh.yn/I y0 D y.0/; (2)

where the map ˆh advances the solution one step ofsize h, from time tn to tnC1 D tn C h. Here thesequence yn is the numerical approximations to theexact solution, y.tn/. The difference en D yn � y.tn/

is the global error of the numerical solution. If themethod is of convergence order p, and the vector fieldf in (1) is sufficiently differentiable, then kenk DO.hp/ as h ! 0.

The accuracy of the numerical solution can also beevaluated locally. The local error ln is defined by

y.tnC1/C ln D ˆh .y.tn// : (3)

Thus, if the method would take a step of size h, startingon the exact solution y.tn/, it will deviate from theexact solution at tnC1 by a small amount, ln. If themethod is of order p, the local error will satisfy

klnk D 'nhpC1 C O.hpC2/ I h ! 0: (4)

Here the principal error function 'n varies along thesolution, and depends on the problem (in terms ofderivatives of f ) as well as on the method.

Using differential inequalities, it can be shown thatthe global and local errors are related by the a priorierror bound

kenk / maxm�n

klmkh

� eMŒf �tn � 1MŒf �

; (5)

1372 Step Size Control

where MŒf � is the logarithmic Lipschitz constant off . Thus, the global error is bounded in terms of thelocal error per unit step, ln=h. For this reason, one canmanage the global error by choosing the step size h soas to keep klnk=h D TOL during the integration, whereTOL is a user-prescribed local error tolerance. Theglobal error is then proportional to TOL, by a factorthat reflects the intrinsic growth or decay of solutions to(1). Good initial value problem solvers usually producenumerical results that reflect this tolerance proportion-ality. By reducing TOL, one reduces the local erroras well as the global error, while computational costincreases, as h � TOL1=p .

Although it is possible to compute a posterioriglobal error estimates, such estimates are often costly.All widely used solvers therefore control the localerror, relying on the relation (5), and the possibility ofcomparing several different numerical solutions com-puted for different values of TOL. There is no claimthat the step size sequences are “optimal,” but in allproblems where the principal error function varies byseveral orders of magnitude, as is the case in stiff dif-ferential equations, local error control is an inexpensivetool that offers vastly increased performance. It is anecessity for efficient computations.

A time stepping method is made adaptive by provid-ing a separate procedure for updating the step size as afunction of the numerical solution. Thus, an adaptivemethod can be written formally as the interactiverecursion

ynC1 D ˆhn.yn/ (6)

hnC1 D ‰ynC1.hn/; (7)

where the first equation represents the numericalmethod and the second the step size control. If‰y I

(the identity map), the scheme reduces to a constantstep size method. Otherwise, the interaction betweenthe two dynamical systems implies that step sizecontrol interferes with the stability of the numericalmethod. For this reason, it is important that step sizecontrol algorithms are designed to increase efficiencywithout compromising stability.

Basic Multiplicative Control

Modern time stepping methods provide a local errorestimate. By using two methods of different orders,

computing two results, ynC1 and OynC1 from yn, thesolver estimates the local error by rn D kynC1� OynC1k.To control the error, the relation between step size anderror is modeled by

rn D O'nhkn: (8)

Here k is the order of the local error estimator. Depend-ing on the estimator’s construction, k may or may notequal pC1, where p is the order of the method used toadvance the solution. For control purposes, however,it is sufficient that k is known, and that the methodoperates in the asymptotic regime, meaning that (8) isan accurate model of the error for the step sizes in use.

The common approach to varying the step size ismultiplicative control

hnC1 D �n � hn; (9)

where the factor �n needs to be determined so that theerror estimate rn is kept near the target value TOL forall n.

A simple control heuristic is derived by requiringthat the next step size hnC1 solves the equation TOL DO'nhknC1; this assumes that 'n varies slowly. Thus,dividing this equation by (8), one obtains

hnC1 D�

TOL

rn

�1=khn: (10)

This multiplicative control is found in many solvers.It is usually complemented by a range of safety mea-sures, such as limiting the maximum step size increase,preventing “too small” step size changes, and specialschemes for recomputing a step, should the estimatederror be much larger than TOL.

Although it often works well, the control law (10)and its safety measures have several disadvantages thatcall for more advanced feedback control schemes. Con-trol theory and digital signal processing, both basedon linear difference equations, offer a wide range ofproven tools that are suitable for controlling the stepsize. Taking logarithms, (10) can be written as thelinear difference equation

loghnC1 D loghn � 1

klog Orn; (11)


S

where Orn D rn=TOL. This recursion continuallychanges the step size, unless log Orn is zero (i.e.,rn D TOL). If Orn > 1 the step size decreases, andif Orn < 1 it increases. Thus, the error rn is kept nearthe set point TOL. As (11) is a summation process, thecontroller is referred to as an integrating controller, orI control. This integral action is necessary in order toeliminate a persistent error, and to find the step sizethat makes rn D TOL.

The difference equation (11) may be viewed asusing the explicit Euler method for integrating a dif-ferential equation that represents a continuous control.Just as there are many different methods for solvingdifferential equations, however, there are many dif-ferent discrete-time controllers that can potentially beoptimized for different numerical methods or problemtypes, and offering different stability properties.

General Multiplicative Control

In place of (11), a general controller takes the errorsequence log Or D flog Orng as input, and produces a stepsize sequence logh D floghng via a linear differenceequation,

.E � 1/Q.E/ logh D �P.E/ log Or: (12)

Here E is the forward shift operator, and P and Q aretwo polynomials of equal degree, making the recursionexplicit. The special case (11) has Q.E/ 1 andP.E/ 1=k, and is a one-step controller, while (12)in general is a multistep controller. Finally, the factorE � 1 in (12) is akin to the consistency condition inlinear multistep methods. Thus, if log Or 0, a solutionto (12) is logh const:

If, for example, P.z/ D ˇ1z C ˇ0 and Q.z/ D z C˛0, then the recursion (12) is equivalent to the two-stepmultiplicative control,

hnC1 D�

TOL

rn

�ˇ1 � TOL

rn�1

�ˇ0 � hn

hn�1

��˛0hn: (13)

By taking logarithms, it is easily seen to correspondto (12). One could include more factors followingthe same pattern, but in general, it rarely pays offto use a longer step size – error history than twoto three steps. Because of the simple structure of(13), it is relatively straightforward to include more

advanced controllers in existing codes, keeping in mindthat a multistep controller is started either by using(10), or by merely putting all factors representingnonexistent starting data equal to one. Examples ofhow to choose the parameters in (13) are found inTable 1.

A causal digital filter is a linear difference equationof the form (12), converting the input signal log Orto an output logh. This implies that digital controland filtering are intimately related. There are severalimportant filter structures that fit the purpose of stepsize control, all covered by the general controller (13).Among them are finite impulse response (FIR) filters;proportional–integral (PI) controllers; autoregressive(AR) filters; and moving average (MA) filters. Thesefilter classes are not mutually exclusive but can becombined.

The elementary controller (10) is a FIR filter, alsoknown as a deadbeat controller. Such controllers havethe quickest dynamic response to variations in log O',but also tend to produce nonsmooth step size se-quences, and sometimes display ringing or stabilityproblems. These problems can be eliminated by usingPI controllers and MA filters that improve stabilityand suppress step size oscillations. Filter design isa matter of determining the filter coefficients withrespect to order conditions and stability criteria, andis reminiscent of the construction of linear multistepmethods [11].

Stability and Frequency Response

Controllers are analyzed and designed by investigatingthe closed loop transfer function. In terms of the ztransform of (12), the control action is

logh D �C.z/ log Or (14)

where the control transfer function is given by

C.z/ D P.z/

.z � 1/Q.z/: (15)

Similarly, the error model (8) can be written

log Or D k � loghC log O' � log TOL: (16)


−1

log loglog TOL

log ϕ

Process

( )

Controller h

StepSizeControl, Fig.1 Time step adaptivity viewed as a feed-back control system. The computational process takes a stepsizelogh as input and produces an error estimate log r D k loghClog O'. Representing the ODE, the principal error function log O'

enters as an additive disturbance, to be compensated by thecontroller. The error estimate log r is fed back and comparedto log TOL. The controller constructs the next stepsize throughlogh D C.z/ � .log TOL � log r/ (From [11])

These relations and their interaction are usually illus-trated in a block diagram, see Fig. 1.

Overall stability depends on the interaction betweenthe controller C.z/ and the computational process.Inserting (16) into (14) and solving for logh yields

logh D �C.z/1C kC.z/

log O' C C.z/

1C kC.z/log TOL:

(17)

Here the closed loop transfer functionH.z/ W log O' 7!logh is defined by

H.z/ D �C.z/1C kC.z/

D �P.z/.z � 1/Q.z/C kP.z/

: (18)

It determines the performance of the combined systemof step size controller and computational process, and,in particular, how successful the controller will be inadjusting logh to log O' so that log r log TOL.

For a controller or a filter to be useful, the closedloop must be stable. This is determined by the poles ofH.z/, which are the roots of the characteristic equation.z � 1/Q.z/C kP.z/ D 0. These must be located wellinside the unit circle, and preferably have positive realparts, so that homogeneous solutions decay quicklywithout oscillations.

To asses frequency response, one takes log O' Dfei!ng with ! 2 Œ0; �� to investigate the output logh DH.ei!/fei!ng. The amplitude jH.ei!/j measures theattenuation of the frequency !. By choosing P suchthatP.ei!/ D 0 for some!�, it follows thatH.ei!�

/ D0. Thus, zeros of P.z/ block signal transmission. Thenatural choice is !� D � so that P.�1/ D 0, asthis will annihilate .�1/n oscillations, and produce asmooth step size sequence. This is achieved by thetwo H211 controllers in Table 1. A smooth step size

Step Size Control, Table 1 Some recommended two-step con-trollers. The H211 controllers produce smooth step size se-quences, using a moving average low-pass filter. In H211b, thefilter can be adjusted. Starting at b D 2 it is a deadbeat (FIR)filter; as the parameter b increases, dynamic response slows andhigh frequency suppression (smoothing) increases. Note that theˇj coefficients are given in terms of the product kˇj for usewith error estimators of different orders k. The ˛ coefficient ishowever independent of k (From [11])

kˇ1 kˇ0 ˛0 Type Name Usage

3=5 �1=5 – PI PI.4.2 Nonstiff solvers1=b 1=b 1=b MA H211b Stiff solvers;

b 2 Œ2; 6�

1=6 1=6 – MA+PI H211 PI Stiff problems,smooth solutions

sequence is of importance, for example, to avoid higherorder BDF methods to suffer stability problems.

Implementation andModes of Operation

Carefully implemented adaptivity algorithms are cen-tral for the code to operate efficiently and reliably forbroad classes of problems. Apart from the accuracyrequirements, which may be formulated in many differ-ent ways, there are several other factors of importancein connection with step size control.

EPS Versus EPUSFor a code that emphasizes asymptotically correcterror estimates, controlling the local error per unitstep klnk=hn is necessary in order to accumulate atargeted global error, over a fixed integration range,regardless of the number of steps needed to completethe integration. Abbreviated EPUS, this approach isviable for nonstiff problems, but tends to be costly for


S

stiff problems, where strong dissipation usually meansthat the global error is dominated by the most recentlocal errors. There, controlling the local error per stepklnk, referred to as EPS, is often a far more efficientoption, if less well aligned with theory. In moderncodes, the trend is generally to put less emphasison asymptotically correct estimates, and control klnkdirectly. This has few practical drawbacks, but it makesit less straightforward to compare the performance oftwo different codes.

Computational StabilityJust as a well-conditioned problem depends continu-ously on the data, the computational procedure shoulddepend continuously on the various parameters thatcontrol the computation. In particular, for toleranceproportionality, there should be constants c andC suchthat the global error e can be bounded above and below,

c � TOL� � kek � C � TOL� ; (19)

where the method is tolerance proportional if � D 1.The smaller the ratio C=c, the better is the compu-tational stability, but C=c can only be made smallwith carefully implemented tools for adaptivity. Thus,with the elementary controller (10), prevented frommaking small step size changes, C=c is typically large,whereas if the controller is based on a digital filter (hereH211b with b D 4, cf. Table 1) allowing a continualchange of the step size, the global error becomes asmooth function of TOL, see Fig. 2. This also showsthat the behavior of an implementation is significantlyaffected by the control algorithms, and how they areimplemented.

Absolute and Relative ErrorsAll modern codes provide options for controlling bothabsolute and relative errors. If at any given time, theestimated error is Ol and the computed solution is y, thena weighted error vector d with components

di DOli

�i C jyi j (20)

is constructed, where �i is a scaling factor, determininga gradual switchover from relative to absolute error asyi ! 0. The error (and the step) is accepted if kdk �TOL, and the expression TOL=kdk corresponds to thefactors TOL=r in (10) and (13). By (20),

di

TOLD

OliTOL � �i C TOL � jyi j : (21)

Most codes employ two different tolerance parameters,ATOL and RTOL, defined by ATOLi D TOL � �i andRTOL D TOL, respectively, replacing the denominatorin (21) by ATOLi C RTOL � jyi j. Thus, the user controlsthe accuracy by the vector ATOL and the scalar RTOL.For scaling purposes, it is also important to note thatTOL and RTOL are dimensionless, whereas ATOL isnot. The actual computational setup will make the stepsize control operate differently as the user-selectedtolerance parameters affect both the set point and thecontrol objective.

Interfering with the ControllerIn most codes, a step is rejected if it exceeds TOL

by a small amount, say 20 %, calling for the step tobe recomputed. As a correctly implemented controlleris expectation-value correct, a too large error is al-most invariably compensated by other errors being toosmall. It is, therefore, in general harmless to acceptsteps that exceed TOL by as much as a factor of 2, andindeed often preferable to minimize interference withthe controller’s dynamics.

Other types of interference may come from con-ditions that prevent “small” step size changes, as thismight call for a refactorization of the Jacobian. How-ever, such concerns are not warranted with smoothcontrollers, which usually make small enough changesnot to disturb the Newton process beyond what can bemanaged. On the contrary, a smoothly changing stepsize is beneficial for avoiding instability in multistepmethods such as the BDF methods.

It is however necessary to interfere with the con-troller’s action when there is a change of method order,or when a too large step size change is suggested. Thisis equivalent to encountering an error that is muchlarger or smaller than TOL. In the first case, the stepneeds to be rejected, and in the second, the step sizeincrease must be held back by a limiter.

Special Problems

Conventional multiplicative control is not useful inconnection with geometric integration, where it failsto preserve structure. The interaction (6, 7) shows that


10−10

10−10 10−8 10−6 10−4

10−4

10−5

10−6

10−7

10−8

10−9

TOL

achi

eved

acc

urac

y

10−10

10−10 10−8 10−6 10−4

10−4

10−5

10−6

10−7

10−8

10−9

TOL

achi

eved

acc

urac

y

Step Size Control, Fig. 2 Global error vs. TOL for a linearmultistep code applied to a stiff nonlinear test problem. Leftpanel shows results when the controller is based on (10). In theright panel, it has been replaced by the digital filter H211b.

Although computational effort remains unchanged, stability ismuch enhanced. The graphs also reveal that the code is nottolerance proportional

adaptive step size selection adds dynamics, interferingwith structure preserving integrators.

A one-step method ˆh W yn 7! ynC1 is calledsymmetric if ˆ�1

h D ˆ�h. This is a minimal require-ment for the numerical integration of, for example,reversible Hamiltonian systems, in order to nearly pre-serve action variables in integrable problems. To makesuch a method adaptive, symmetric step size control isalso needed. An invertible step size map ‰y W R ! R

is called symmetric if �‰y is an involution, see Fig. 3.A symmetric ‰y then maps hn�1 to hn and �hn to�hn�1, and only depends on yn; with these conditionssatisfied, the adaptive integration can be run in reversetime and retrace the numerical trajectory that wasgenerated in forward time, [8]. However, this cannotbe achieved by multiplicative controllers (9), and aspecial, nonlinear controller is therefore necessary.

An explicit control recursion satisfying therequirements is either additive or inverse-additive, withthe latter being preferable. Thus, a controller of theform

1

hn� 1

hn�1D G.yn/ (22)

Ψ−1

−1Φh h

h

h

h h

h

hΦ−

+1Φ

Φ−

Ψ− −1 −

Step Size Control, Fig. 3 Symmetric adaptive integration inforward time (upper part), and reverse time (lower part) illus-trate the interaction (6, 7). The symmetric step size map ‰ygoverns both h and �h (From [8])

can be used, where the function G needs to be chosenwith respect to the symmetry and geometric prop-erties of the differential equation to be solved. Thisapproach corresponds to constructing a Hamiltoniancontinuous control system, which is converted to thediscrete controller (22) by geometric integration of thecontrol system. This leaves the long-term behavior ofthe geometric integrator intact, even in the presence

Stochastic and Statistical Methods in Climate, Atmosphere, and Ocean Science 1377

S

of step size variation. It is also worth noting that (22)generates a smooth step size sequence, as hn � hn�1 DO.hnhn�1/.

This type of control does not work with an errorestimate, but rather tracks a prescribed target function;it corresponds to keeping hQ.y/ D const:, whereQ isa given functional reflecting the geometric structure of(1). One can then takeG.y/ D gradQ.y/�f .y/=Q.y/.For example, in celestial mechanics, Q.y/ could beselected as total centripetal acceleration; then the stepsize is small when centripetal acceleration is largeand vice versa, concentrating the computational effortto those intervals where the solution of the problemchanges rapidly and is more sensitive to perturbations.

Literature

Step size control has a long history, starting withthe first initial value problem solvers around 1960,often using a simple step doubling/halving strategy.The controller (10) was soon introduced, and furtherdevelopments quickly followed. Although the schemeswere largely heuristic, performance tests and practicalexperience developed working standards. Monographssuch as [1, 2, 6, 7, 10] all offer detailed descriptions.

The first full control theoretic analysis is foundin [3, 4], explaining and overcoming some previouslynoted difficulties, developing proportional-integral (PI)and autoregressive (AR) controllers. Synchronizationwith Newton iteration is discussed in [5]. A completeframework for using digital filters and signal process-ing is developed in [11], focusing on moving average(MA) controllers. Further developments on how toobtain improved computational stability are discussedin [12].

The special needs of geometric integration are dis-cussed in [8], although the symmetric controllers arenot based on error control. Error control in implicit,symmetric methods is analyzed in [13].

References

1. Butcher, J.C.: Numerical Methods for Ordinary DifferentialEquations. Wiley, Chichester (2008)

2. Gear, C.W.: Numerical Initial Value Problems in OrdinaryDifferential Equations. Prentice Hall, Englewood Cliffs(1971)

3. Gustafsson, K.: Control theoretic techniques for stepsizeselection in explicit Runge–Kutta methods. ACM TOMS17, 533–554 (1991)

4. Gustafsson, K.: Control theoretic techniques for stepsizeselection in implicit Runge–Kutta methods. ACM TOMS20, 496–517 (1994)

5. Gustafsson, K., Soderlind, G.: Control strategies for theiterative solution of nonlinear equations in ODE solvers.SIAM J. Sci. Comp. 18, 23–40 (1997)

6. Hairer, E., N�rsett, S.P., Wanner, G.: Solving Ordinary Dif-ferential Equations I: Nonstiff Problems, 2nd edn. Springer,Berlin (1993)

7. Hairer, E., Wanner, G.: Solving Ordinary Differential Equa-tions II: Stiff and Differential-Algebraic Problems, 2nd edn.Springer, Berlin (1996)

8. Hairer, E., Soderlind, G.: Explicit, time reversible, adaptivestep size control. SIAM J. Sci. Comp. 26, 1838–1851 (2005)

9. Hairer, E., Lubich, C., Wanner, G.: Geometric NumericalIntegration. Structure-Preserving Algorithms for OrdinaryDifferential Equations, 2nd edn. Springer, Berlin (2006)

10. Shampine, L., Gordon, M.: Computer Solution of OrdinaryDifferential Equations: The Initial Value Problem. Freeman,San Francisco (1975)

11. Soderlind, G.: Digital filters in adaptive time-stepping.ACM Trans. Math. Softw. 29, 1–26 (2003)

12. Soderlind, G., Wang, L.: Adaptive time-stepping and com-putational stability. J. Comp. Methods Sci. Eng. 185, 225–243 (2006)

13. Stoffer, D.: Variable steps for reversible integration meth-ods. Computing 55, 1–22 (1995)

Stochastic and Statistical Methods inClimate, Atmosphere, and Ocean Science

Daan Crommelin1;2 and Boualem Khouider31Scientific Computing Group, Centrum Wiskunde andInformatica (CWI), Amsterdam, The Netherlands2Korteweg-de Vries Institute for Mathematics,University of Amsterdam, Amsterdam, TheNetherlands3Department of Mathematics and Statistics, Universityof Victoria, Victoria, BC, Canada

Introduction

The behavior of the atmosphere, oceans, and climateis intrinsically uncertain. The basic physical principlesthat govern atmospheric and oceanic flows are wellknown, for example, the Navier-Stokes equations forfluid flow, thermodynamic properties of moist air, andthe effects of density stratification and Coriolis force.

1378 Stochastic and Statistical Methods in Climate, Atmosphere, and Ocean Science

Notwithstanding, there are major sources of random-ness and uncertainty that prevent perfect prediction andcomplete understanding of these flows.

The climate system involves a wide spectrum ofspace and time scales due to processes occurring onthe order of microns and milliseconds such as theformation of cloud and rain droplets to global phenom-ena involving annual and decadal oscillations such asthe EL Nio-Southern Oscillation (ENSO) and the Pa-cific Decadal Oscillation (PDO) [5]. Moreover, climaterecords display a spectral variability ranging from 1cycle per month to 1 cycle per 100;000 years [23]. Thecomplexity of the climate system stems in large partfrom the inherent nonlinearities of fluid mechanics andthe phase changes of water substances. The atmosphereand oceans are turbulent, nonlinear systems that dis-play chaotic behavior (e.g., [39]). The time evolutionsof the same chaotic system starting from two slightlydifferent initial states diverge exponentially fast, so thatchaotic systems are marked by limited predictability.Beyond the so-called predictability horizon (on theorder of 10 days for the atmosphere), initial stateuncertainties (e.g., due to imperfect observations) havegrown to the point that straightforward forecasts are nolonger useful.

Another major source of uncertainty stems fromthe fact that numerical models for atmospheric andoceanic flows cannot describe all relevant physicalprocesses at once. These models are in essencediscretized partial differential equations (PDEs), andthe derivation of suitable PDEs (e.g., the so-calledprimitive equations) from more general ones thatare less convenient for computation (e.g., the fullNavier-Stokes equations) involves approximations andsimplifications that introduce errors in the equations.Furthermore, as a result of spatial discretization ofthe PDEs, numerical models have finite resolutionso that small-scale processes with length scalesbelow the model grid scale are not resolved. Theselimitations are unavoidable, leading to model error anduncertainty.

The uncertainties due to chaotic behavior andunresolved processes motivate the use of stochastic andstatistical methods for modeling and understandingclimate, atmosphere, and oceans. Models can beaugmented with random elements in order to representtime-evolving uncertainties, leading to stochasticmodels. Weather forecasts and climate predictionsare increasingly expressed in probabilistic terms,

making explicit the margins of uncertainty inherentto any prediction.

Statistical Methods

For assessment and validation of models, a compar-ison of individual model trajectories is typically notsuitable, because of the uncertainties described earlier.Rather, the statistical properties of models are usedto summarize model behavior and to compare againstother models and against observations. Examples arethe mean and variance of spatial patterns of rainfall orsea surface temperature, the time evolution of globalmean temperature, and the statistics of extreme events(e.g., hurricanes or heat waves). Part of the statisticalmethods used in this context is fairly general, notspecifically tied to climate-atmosphere-ocean science(CAOS). However, other methods are rather specificfor CAOS applications, and we will highlight some ofthese here. General references on statistical methods inCAOS are [61, 62].

EOFsA technique that is used widely in CAOS is PrincipalComponent Analysis (PCA), also known as EmpiricalOrthogonal Function (EOF) analysis in CAOS. Con-sider a multivariate dataset ˆ 2 RM�N . In CAOS thiswill typically be a time series �.t1/; �.t2/; : : : ; �.tN /where each �.tn/ 2 RM is a spatial field (of, e.g., tem-perature or pressure). For simplicity we assume thatthe time mean has been subtracted from the dataset, soPN

nD1 ˆmn D 0 8m. Let C be the M � M (sample)covariance matrix for this dataset:

C D 1

N � 1ˆˆT :

We denote by .�m; vm/, m D 1; : : : ;M the orderedeigenpairs of C :

C vm D �m vm ; �m � �mC1 8m:

The ordering of the (positive) eigenvalues implies thatthe projection of the dataset onto the leading eigen-vector v1 gives the maximum variance among allprojections. The next eigenvector v2 gives the max-imum variance among all projections orthogonal tov1, v3 gives maximum variance among all projections


S

orthogonal to v1 and v2, etc. The fraction �m=P

l �lequals the fraction of the total variance of the datacaptured by projection onto the m-th eigenvector vm.

The eigenvectors vm are called the Empirical Or-thogonal Functions (EOFs) or Principal Components(PCs). Projecting the original dataset ˆ onto the lead-ing EOFs, i.e., the projection/reduction

�r.tn/ DM 0XmD1

˛m.tn/vm ; M 0 � M;

can result in a substantial data reduction while retain-ing most of the variance of the original data.

PCA is discussed in great detail in [27] and [59].Over the years, various generalizations and alternativesfor PCA have been formulated, for example, PrincipalInteraction and Oscillation Patterns [24], NonlinearPrincipal Component Analysis (NLPCA) [49], andNonlinear Laplacian Spectral Analysis (NLSA) [22].These more advanced methods are designed to over-come limitations of PCA relating to the nonlinear ordynamical structure of datasets.

In CAOS, the EOFs vm often correspond to spatialpatterns. The shape of the patterns of leading EOFscan give insight in the physical-dynamical processesunderlying the dataset ˆ. However, this must be donewith caution, as the EOFs are statistical constructionsand cannot always be interpreted as having physicalor dynamical meaning in themselves (see [50] for adiscussion).

The temporal properties of the (time-dependent)coefficients ˛m.t/ can be analyzed by calculating,e.g., autocorrelation functions. Also, models for thesecoefficients can be formulated (in terms of ordinarydifferential equations (ODEs), stochastic differentialequations (SDEs), etc.) that aim to capture the maindynamical properties of the original dataset or modelvariables �.t/. For such reduced models, the emphasisis usually on the dynamics on large spatial scales andlong time scales. These are embodied by the leadingEOFs vm; m D 1; : : : ;M 0, and their correspondingcoefficients ˛m.t/, so that a reduced model .M 0 � M/

can be well capable of capturing the main large-scaledynamical properties of the original dataset.

Inverse ModelingOne way of arriving at reduced models is inverse mod-eling, i.e., the dynamical model is obtained throughstatistical inference from time series data. The data can

be the result of, e.g., projecting the dataset ˆ onto theEOFs (in which case the data are time series of ˛.t/).These models are often cast as SDEs whose parametersmust be estimated from the available time series. If theSDEs are restricted to have linear drift and additivenoise (i.e., restricted to be those of a multivariateOrnstein-Uhlenbeck (OU) process), the estimation canbe carried out for high-dimensional SDEs rather easily.That is, assume the SDEs have the form

d˛.t/ D B ˛.t/ dt C � dW.t/ ; (1)

in which B and � are both a constant real M 0 �M 0 matrix and W.t/ is an M 0-dimensional vectorof independent Wiener processes (for simplicity weassume that ˛ has zero mean). The parameters of thismodel are the matrix elements of B an � . They can beestimated from two (lagged) covariance matrices of thetime series. If we define

R0ij D E˛i .t/˛j .t/ ; Rij D E˛i .t/˛j .t C / ;

with E denoting expectation, then for the OU process(1), we have the relations

R D exp.B /R0

andBR0 CR0BT C ��T D 0

The latter of these is the fluctuation-dissipation relationfor the OU process. By estimating R0 and R (withsome > 0) from time series of ˛, estimates forB and A WD ��T can be easily computed usingthese relations. This procedure is sometimes referredto as linear inverse modeling (LIM) in CAOS [55].The matrix � cannot be uniquely determined from A;however, any � for which A D ��T (e.g., obtainedby Cholesky decomposition of A) will result in an OUprocess with the desired covariancesR0 and R .

As mentioned, LIM can be carried out rather easilyfor multivariate processes. This is a major advantageof LIM. A drawback is that the OU process (1) cannotcapture non-Gaussian properties, so that LIM can onlybe used for data with Gaussian distributions. Also, theestimated B and A are sensitive to the choice of ,unless the available time series is a an exact samplingof (1).


Estimating diffusion processes with non-Gaussianproperties is much more complicated. There are var-ious estimation procedures available for SDEs withnonlinear drift and/or multiplicative noise; see, e.g.,[30, 58] for an overview. However, the practical useof these procedures is often limited to SDEs withvery low dimensions, due to curse of dimension or tocomputational feasibility. For an example applicationin CAOS, see, e.g., [4].

The dynamics of given time series can also becaptured by reduced models that have discrete statespaces, rather than continuous ones as in the case ofSDEs. There are a number of studies in CAOS thatemploy finite-state Markov chains for this purpose(e.g., [8,48,53]). It usually requires discretization of thestate space; this can be achieved with, e.g., clusteringmethods. A more advanced methodology, building onthe concept of Markov chains yet resulting in contin-uous state spaces, is that of hidden Markov models.These have been used, e.g., to model rainfall data (e.g.,[3, 63]) and to study regime behavior in large-scaleatmospheric dynamics [41]. Yet a more sophisticatedmethodology that combines the clustering and Markovchain concepts, specifically designed for nonstationaryprocesses, can be found in [25].

Extreme EventsThe occurrence of extreme meteorological events, suchas hurricanes, extreme rainfall, and heat waves, isof great importance because of their societal impact.Statistical methods to study extreme events are there-fore used extensively in CAOS. The key question forstudying extremes with statistical methods is to be ableto assess the probability of certain events, having only adataset available that is too short to contain more thana few of these events (and occasionally, too short tocontain even a single event of interest). For example,how can one assess the probability of sea water levelat some coastal location being more than 5 m aboveaverage if only 100 years of observational data for thatlocation is available, with a maximum of 4 m aboveaverage? Such questions can be made accessible usingextreme value theory. General introductions to extremevalue theory are, e.g., [7] and [11]. For recent researchon extremes in the context of climate science, see, e.g.,[29] and the collection [1].

The classical theory deals with sequences or obser-vations of N independent and identically distributed(iid) random variables, denoted here by r1; : : : ; rN .

Let MN be the maximum of this sequence, MN Dmaxfr1; : : : ; rN g. If the probability distribution forMN

can be rescaled so that it converges in the limit of in-creasingly long sequences (i.e., N ! 1), it convergesto a generalized extreme value (GEV) distribution.More precisely, if there are sequences aN .> 0/ andbN such that Prob..MN � bN /=aN � z/ ! G.z/ asN ! 1, then

G.z/ D exp

��1C

z � �

�

��1=�:

G.z/ is a GEV distribution, with parameters � (lo-cation), � > 0 (scale), and (shape). It combinesthe Frechet ( > 0), Weibull ( < 0), and Gumbel( ! 0) families of extreme value distributions. Notethat this result is independent of the precise distributionof the random variables rn. The parameters �; �; canbe inferred by dividing the observations r1; r2; : : : inblocks of equal length and considering the maxima onthese blocks (the so-called block maxima approach).

An alternative method for characterizing extremes,making more efficient use of available data than theblock maxima approach, is known as the peaks-over-threshold (POT) approach. The idea is to set a thresh-old, say r�, and study the distribution of all observa-tions rn that exceed this threshold. Thus, the objectof interest is the conditional probability distributionProb.rn � r� > z j rn > r�/, with z > 0. Underfairly general conditions, this distribution converges to1 � H.z/ for high thresholds r�, where H.z/ is thegeneralized Pareto distribution (GPD):

H.z/ D 1 ��1C z

Q��1=

:

The parameters of the GPD family of distributions aredirectly related to those of the GEV distribution: theshape parameter is the same in both, whereas thethreshold-dependent scale parameter is Q� D �C.r��/ with � and � as in the GEV distribution.

By inferring the parameters of the GPD or GEVdistributions from a given dataset, one can calculateprobabilities of extremes that are not present them-selves in that dataset (but have the same underlyingdistribution as the available data). In principle, thismakes it possible to assess risks of events that have notbeen observed, provided the conditions on convergenceto GPD or GEV distributions are met.


S

As mentioned, classical results on extreme valuetheory apply to iid random variables. These resultshave been generalized to time-correlated random vari-ables, both stationary and nonstationary [7]. This isimportant for weather and climate applications, wheredatasets considered in the context of extremes are oftentime series. Another relevant topic is the developmentof multivariate extreme value theory [11].

Stochastic Methods

Given the sheer complexity of climate-atmosphere-ocean (CAO) dynamics, when studying the global cli-mate system or some parts of global oscillation patternssuch ENSO or PDO, it is natural to try to separatethe global dynamics occurring on longer time scalesfrom local processes which occur on much shorterscales. Moreover, as mentioned before, climate andweather prediction models are based on a numericaldiscretization of the equations of motion, and due tolimitations in computing resources, it is simply impos-sible to represent the wide range of space and timescales involved in CAO. Instead, general circulationmodels (GCMs) rely on parameterization schemes torepresent the effect of the small/unresolved scales onthe large/resolved scales. Below, we briefly illustratehow stochastic models are used in CAO both to buildtheoretical models that separate small-scale (noise) andlarge-scale dynamics and to “parameterize” the effectof small scales on large scales. A good snapshot on thestate of the art, during the last two decades or so, instochastic climate modeling research can be found in[26, 52].

Model Reduction for Noise-Driven Large-ScaleDynamicsIn an attempt to explain the observed low-frequencyvariability of CAO, Hasselmann [23] splits the systeminto slow climate components (e.g., oceans, biosphere,cryosphere), denoted by the vector x, and fast com-ponents representing the weather, i.e., atmosphericvariability, denoted by a vector y. The full climatesystem takes the form

dx

dtD u.x; y/ (2)

dy

dtD v.x; y/;

where t is time and u.x; v/ and v.x; y/ contain theexternal forcing and internal dynamics that couple theslow and fast variables.

Hasselmann assumes a large scale-separationbetween the slow and fast time scales: y DO

yj

�dyjdt

��1!� x D O

xi

�dxidt

��1!, for

all components i and j . The time scale separationwas used earlier to justify statistical dynamical models(SDM) used then to track the dynamics of the climatesystem alone under the influence of external forcing.Without the variability due to the internal interactionsof CAO, the SDMs failed badly to explain the observed“red” spectrum which characterizes low-frequencyvariability of CAO.

Hasselmann made the analogy with the Brownianmotion (BM), modeling the erratic movements of a fewlarge particles immersed in a fluid that are subject tobombardments by the rapidly moving fluid moleculesas a “natural” extension of the SDM models. Moreover,Hasselmann [23] assumes that the variability of x canbe divided into a mean tendency hdx=dti D hu.x; y/i(Here h:i denotes average with respect to the jointdistribution of the fast variables.) and a fluctuationtendency dx0=dt D u.x; y/ � hu.x; y/i D u0.x; y/which, according to the Brownian motion problem, isassumed to be a pure diffusion process or white noise.However, unlike BM, Hasselmann argued that for theweather and climate system, the statistics of y arenot in equilibrium but depend on the slowly evolvinglarge-scale dynamics and thus can only be obtainedempirically. To avoid linear growth of the covariancematrix hx0 ˝ x0i, Hasselmann assumes a dampingterm proportional to the divergence of the backgroundfrequency F.0/ of hx0 ˝x0i, where ı.!�!0/Fij.!/ DhVi.!/Vj .!0/i with V.!/ D 1

2�

R1�1 u0.t/e�i!tdt.

This leads to the Fokker-Plank equation: [23]

@p.x; t/

@tCrx �.Ou.x/p.x; t// D rx �.Drxp.x; t// (3)

for the distribution p.x; t/ of x.t/ as a stochasticprocess given that x.0/ D x0, where D is the nor-malized covariance matrix D D hx0 ˝ x0i=2t andOu D hui��rx �F.0/. Given the knowledge of the meanstatistical forcing hui, the evolution equation for p canbe determined from the time series of x obtained eitherfrom a climate model simulation of from observations.Notice also that for a large number of slow variables xi ,


the PDE in (3) is impractical; instead, one can alwaysresort to Monte Carlo simulations using the associatedLangevin equation:

dx D Ou.x/dt C†.x/dW t (4)

where †.x/†.x/T D D.x/. However, the functionaldependence of Ou and D remains ambiguous, and rely-ing on rather empirical methods to define such termsis unsatisfactory. Nonetheless, Hasselmann introduceda “linear feedback” version of his model where thedrift or propagation term is a negative definite linearoperator: Ou.x/ D Ux and D is constant, independentof x as an approximation for short time excursions ofthe climate variables. In this case, p.x; t/ is simply aGaussian distribution whose time-dependent mean andvariance are determined by the matrices D and U asnoted in the inverse modeling section above.

Due to its simplicity, the linear feedback model iswidely used to study the low-frequency variability ofvarious climate processes. It is, for instance, used in[17] to reproduce the observed red spectrum of thesea surface temperature in midlatitudes using simula-tion data from a simplified coupled ocean-atmospheremodel. However, this linear model has severe limi-tations of, for example, not being able to representdeviations from Gaussian distribution of some climatephenomena [13, 14, 17, 36, 51, 54]. It is thus naturalto try to reincorporate a nonlinearity of some kindinto the model. The most popular idea consisted inmaking the matrix D or equivalently † dependent onx (quadratically for D or linearly for † as a nextorder Taylor correction) to which is tied the notionof multiplicative versus additive (when D is constant)noise [37, 60]. Beside the crude approximation, theapparent advantage of this approach is the maintainingof the stabilizing linear operator U in place although itis not universally justified.

A mathematical justification for Hasselmann’sframework is provided by Arnold and his collaborators(see [2] and references therein). It is based on thewell-known technique of averaging (the law of largenumbers) and the central limit theorem. However, as inHasselmann’s original work, it assumes the existenceand knowledge of the invariant measure of thefast variables. Nonetheless, a rigorous mathematicalderivation of such Langevin-type models for the slowclimate dynamics, using the equations of motion in

discrete form, is possible as illustrated by the MTVtheory presented next.

The Systematic Mode Reduction MTVMethodologyA systematic mathematical methodology to deriveLangevin-type equations (4) a la Hasselmann, for theslow climate dynamics from the coupled atmosphere-ocean-land equations of motion, which yields thepropagation (or drift) and diffusion terms Ou.x/ andD.x/ in closed form, is presented in [44,45] by Majda,Timofeyev, and Vanden-Eijnden (MTV).

Starting from the generalized form of the discretizedequations of motion

dz

dtD Lz C B.z; z/C f .t/

where L and B are a linear and a bilinear operatorswhile f .t/ represent external forcing, MTV operatethe same dichotomy as Hasselmann did of splittingthe vector z into slow and fast variables x and y, re-spectively. However, they introduced a nondimensionalparameter � D y=x which measures the degree oftime scale separation between the two sets of variables.This leads to the slow-fast coupled system

dx D��1 .L11x C L12y/ dt C B111.x; x/dt

C ��1 �B112.x; y/C B1

22.y; y/�

dt

C Dxdt C F1.t/dt C ��1f1.��1t/ (5)

dy D��1 �L21x CL22y C B212.x; y/C B2

22.y; y/�

dt

� ��2�ydt C ��1�dW t C ��1f2.��1t/

under a few key assumptions, including (1) the non-linear self interaction term of the fast variables is“parameterized” by an Ornstein-Uhlenbeck process:B222.y; y/dt WD ��1�ydt Cp

��1

dW t and (2) a smalldissipation term �Dxdt is added to the slow dynamicswhile (3) the slow variable forcing term assumes slowand fast contributions f1.t/ D �F1.�t/C f1.t/. More-over, the system in (5) is written in terms of the slowtime t �! �t .

MTV used the theory of asymptotic expansion ap-plied to the backward Fokker-Plank equation associ-ated with the stochastic differential system in (5) toobtain an effective reduced Langevin equation (4) forthe slow variables x in the limit of large separationof time scales � �! 0 [44, 45]. The main advantage


S

of the MTV theory is that unlike Hasselmann’s adhoc formulation, the functional form of the drift anddiffusion coefficients, in terms of the slow variables,are obtained and new physical phenomena can emergefrom the large-scale feedback besides the assumedstabilization effect. It turns out that the drift term isnot always stabilizing, but there are dynamical regimeswhere growing modes can be excited and, dependingon the dynamical configuration, the Langevin equation(4) can support either additive or multiplicative noise.

Even though MTV assumes strict separation ofscales, � � 1, it is successfully used for a widerange of examples including cases where � D O.1/

[46]. Also in [47], MTV is successfully extendedto fully deterministic systems where the requirementthat the fast-fast interaction term B22.y; y/ in (5) isparameterized by an Ornstein-Uhlenbeck process isrelaxed. Furthermore, MTV is applied to a wide rangeof climate problems. It is used, for instance, in [19] fora realistic barotropic model and extended in [18] to athree-layer quasi-geostrophic model. The example ofmidlatitude teleconnection patterns where multiplica-tive noise plays a crucial role is studied in [42]. MTV isalso applied to the triad and dyad normal mode (EOF)interactions for arbitrary time series [40].

Stochastic ParametrizationIn a typical GCM, the parametrization of unresolvedprocesses is based on theoretical and/or empiricaldeterministic equations. Perhaps the area wheredeterministic parameterizations have failed themost is moist convection. GCMs fail very badlyin simulating the planetary and intra-seasonalvariability of winds and rainfall in the tropics dueto the inadequate representation of the unresolvedvariability of convection and the associated cross-scale interactions behind the multiscale organizationof tropical convection [35]. To overcome this problem,some climate scientists introduced random variables tomimic the variability of such unresolved processes.Unfortunately, as illustrated below, many of theexisting stochastic parametrizations were based onthe assumptions of statistical equilibrium and/or of astationary distribution for the unresolved variability,which are only valid to some extent when there is scaleseparation.

The first use of random variables in CGMs appearedin Buizza et al. [6] as means for improving the skillof the ECMWF ensemble prediction system (EPS).

Buizza et al. [6] used uniformly distributed randomscalars to rescale the parameterized tendencies in thegoverning equations. Similarly, Lin and Neelin [38]introduced a random perturbation in the tendency ofconvective available potential energy (CAPE). In [38],the random noise is assumed to be a Markov process ofthe form tC t D �t t C zt where zt is a white noisewith a fixed standard deviation and �t is a parameter.Plant and Craig [57] used extensive cloud-permittingnumerical simulations to empirically derive the param-eters for the PDF of the cloud base mass flux itselfwhose Poisson shape is determined according to ar-guments drawn from equilibrium statistical mechanics.Careful simulations conducted by Davoudi et al. [10]revealed that while the Poisson PDF is more or lessaccurate for isolated deep convective clouds, it fails toextend to cloud clusters where a variety of cloud typesinteract with each other: a crucial feature of organizedtropical convection.

Majda and Khouider [43] borrowed an idea frommaterial science [28] of using the Ising model offerromagnetization to represent convective inhibition(CIN). An order parameter � , defined on a rectangularlattice, embedded within each horizontal grid box ofthe climate model, takes values 1 or 0 at a givensite, according to whether there is CIN or there ispotential for deep convection (PAC). The lattice modelmakes transitions at a given site according to intuitiveprobability rules depending both on the large-scale cli-mate model variables and on local interactions betweenlattice sites based on a Hamiltonian energy principle.The Hamiltonian is given by

H.�;U / D �12

Xx;y

J.jx�yj/�.x/�.y/Ch.U /Xx

�x

where J.r/ is the local interaction potential and h.U /is the external potential which depends on the climatevariables U and where the summations are taken overall lattice sites x; y. A transition (spin-flip by analogyto the Ising model of magnetization) occurs at a sitey if for a small time , we have �tC .y/ D 1 ��t .y/ and �tC .x/ D �t .x/ if x ¤ y. Transitionsoccur at a rate C.y; �; U / set by Arrhenius dynamics:C.x; �; U / D 1

Iexp.� xH.�; U // if �x D 0 and

C.x; �; U / D 1I

if �x D 1 so that the resulting Markovprocess satisfies detailed balance with respect to theGibbs distribution �.�; U / / exp.�H.�;U /. Here


xH.�; U // D H.�CŒ1��.x/�ex/; U /�H.�;U / D�Pz J.jx � zj/�.z/C h.U / with ex.y/ D 1 if y D x

and 0 otherwise.For computational efficiency, a coarse graining of

the stochastic CIN model is used in [34] to derive astochastic birth-death process for the mesoscopic areacoverage �X D P

x2X �.x/ where X represents ageneric site of a mesoscopic lattice, which in practicecan be considered to be the GCM grid. The stochasticCIN model is coupled to a toy GCM where it issuccessfully demonstrated how the addition of such astochastic model could improve the climatology andwaves dynamics in a deficient GCM [34, 42].

This Ising-type modeling framework is extended in[33] to represent the variability of organized tropicalconvection (OTC). A multi-type order parameter isintroduced to mimic the multimodal nature of OTC.Based on observations, tropical convective systems(TCS) are characterized by three cloud types, cumuluscongestus whose height does not exceed the freezinglevel develop when the atmosphere is dry, and there isconvective instability, positive CAPE. In return conges-tus clouds moisten the environment for deep convectivetowers. Stratiform clouds that develop in the uppertroposphere lag deep convection as a natural freezingphase in the upper troposphere. Accordingly, the neworder parameter � takes the multiple values 0,1,2,3, ona given lattice site, according to whether the given siteis, respectively, clear sky or occupied by a congestus,deep, or stratiform cloud.

Similar Arrhenius-type dynamics are used to buildtransition rates resulting in an ergodic Markov processwith a well-defined equilibrium measure. Unphysi-cal transitions of congestus to stratiform, stratiformto deep, stratiform to congestus, clear to stratiform,and deep to congestus were eliminated by setting theassociated rates to zero. When local interactions areignored, the equilibrium measure and the transitionrates depend only on the large-scale climate variablesU where CAPE and midlevel moisture are used astriggers and the coarse-graining process is carried withexact statistics. It leads to a multidimensional birth-death process with immigration for the area fractionsof the associated three cloud types. The stochasticmulticloud model (SMCM) is used very successfullyin [20, 21] to capture the unresolved variability oforganized convection in a toy GCM. The simula-tion of convectively coupled gravity waves and mean

climatology were improved drastically when comparedto their deterministic counterparts. The realistic statis-tical behavior of the SMCM is successfully assessedagainst observations in [56]. Local interaction effectsare reintroduced in [32] where a coarse-graining ap-proximation based on conditional expectation is usedto recover the multidimensional birth-death processdynamics with local interactions. A Bayesian method-ology for inferring key parameters for the SMCM isdeveloped and validated in [12]. A review of the basicmethodology of the CIN and SMCM models, which issuitable for undergraduates, is found in [31].

A systematic data-based methodology for inferringa suitable stochastic process for unresolved processesconditional on resolved model variables was proposedin [9]. The local feedback from unresolved processeson resolved ones is represented by a small Markovchain whose transition probability matrix is made de-pendent on the resolved-scale state. The matrix isestimated from time series data that is obtained fromhighly resolved numerical simulations or observations.This approach was developed and successfully testedon the Lorenz ’96 system [39] in [9]. [16] applied it toparameterize shallow cumulus convection, using datafrom large eddy simulation (LES) of moist atmosphericconvection. A two-dimensional lattice, with at eachlattice node a Markov chain, was used to mimic (oremulate) the convection as simulated by the high-resolution LES model, at a fraction of the computa-tional cost.

Subsequently, [15] combined the conditionalMarkov chain methodology with elements from theSMCM [33]. They applied it to deep convectionbut without making use of the Arrhenius functionalforms of the transition rates in terms of the large-scalevariables (as was done in [33]). Similar to [16], LESdata was used for estimation of the Markov chaintransition probabilities. The inferred stochastic modelin [15] was well capable of generating cloud fractionsvery similar to those observed in the LES data. Whilethe main cloud types of the original SMCM werepreserved, an important improvement in [15] residesin the addition of a fifth state for shallow cumulusclouds. As an experiment, direct spatial coupling ofthe Markov chains on the lattice was also consideredin [15]. Such coupling amounts to the structure ofa stochastic cellular automaton (SCA). Without thisdirect coupling, the Markov chains are still coupled,


S

but indirectly, through their interaction with the large-scale variables (see, e.g., [9]).

References

1. AghaKouchak, A., Easterling, D., Hsu, K., Schubert, S.,Sorooshian, S. (eds.): Extremes in a Changing Climate,p. 423. Springer, Dordrecht (2013)

2. Arnold, L.: Hasselmann’s program revisited: the analysis ofstochasticity in deterministic climate models. In: Imkeller,P., von Storch, J.-S. (eds) Stochastic Climate Models.Birkhauser, Basel (2001)

3. Bellone, E., Hughes, J.P., Guttorp, P.: A Hidden Markovmodel for downscaling synoptic atmospheric patterns toprecipitation amounts. Clim. Res. 15, 1–12 (2000)

4. Berner, J.: Linking nonlinearity and non-Gaussianity ofplanetary wave behavior by the Fokker-Planck equation. J.Atmos. Sci. 62, 2098–2117 (2005)

5. Bond, N.A., Harrison, D.E.: The pacific decadal oscillation,air-sea interaction and central north pacific winter atmo-spheric regimes. Geophys. Res. Lett. 27, 731–734 (2000)

6. Buizza, R., Milleer, M., Palmer, T.N.: Stochastic represen-tation of model uncertainties in the ECMWF ensemble pre-diction system. Quart. J. R. Meteorol. Soc. 125, 2887–2908(1999)

7. Coles, S.: An Introduction to Statistical Modeling of Ex-treme Values, p. 208. Springer, London (2001); StatisticalMethods in the Atmospheric Sciences, 3rd edn., p. 704.Academic, Oxford

8. Crommelin, D.T.: Observed non-diffusive dynamics inlarge-scale atmospheric flow. J. Atmos. Sci. 61, 2384–239(2004)

9. Crommelin, D.T., Vanden-Eijnden, E.: Subgrid-scale pa-rameterization with conditional Markov chains. J. Atmos.Sci. 65, 2661–2675 (2008)

10. Davoudi, J., McFarlane, N.A., Birner, T.: Fluctuation ofmass flux in a cloud-resolving simulation with interactiveradiation. J. Atmos. Sci. 67, 400–418 (2010)

11. de Haan, L., Ferreira, A.: Extreme Value Theory: An Intro-duction, p. 417. Springer, New York (2006)

12. de la Chevrotiere, M., Khouider, B., Majda, A.: Calibrationof the stochastic multicloud model using Bayesian infer-ence. SIAM J. Sci. Comput. 36(3), B538–B560 (2014)

13. DelSole, T.: Stochastic models of quasi-geostrophic turbu-lence. Surv. Geophys. 25, 107–149 (2004)

14. DelSole, T., Farrel, B.F.: Quasi-linear equilibration of athermally maintained, stochastically excited jet in a quasi-geostrophic model. J. Atmos. Sci. 53, 1781–1797 (1996)

15. Dorrestijn, J., Crommelin, D.T., Biello, J.A., Boing, S.J.:A data-driven multicloud model for stochastic parameter-ization of deep convection. Phil. Trans. R. Soc. A 371,20120374 (2013)

16. Dorrestijn, J., Crommelin, D.T., Siebesma, A.P., Jonker,H.J.J.: Stochastic parameterization of shallow cumulus con-vection estimated from high-resolution model data. Theor.Comput. Fluid Dyn. 27, 133–148 (2013)

17. Frankignoul, C., Hasselmann, K.: Stochastic climate mod-els, Part II application to sea-surface temperature anomaliesand thermocline variability. Tellus 29, 289–305 (1977)

18. Franzke, C., Majda, A.: Low order stochastic mode reduc-tion for a prototype atmospheric GCM. J. Atmos. Sci. 63,457–479 (2006)

19. Franzke, C., Majda, A., Vanden-Eijnden, E.: Low-orderstochastic mode reduction for a realistic barotropic modelclimate. J. Atmos. Sci. 62, 1722–1745 (2005)

20. Frenkel, Y., Majda, J., Khouider, B.: Using the stochasticmulticloud model to improve tropical convective parameter-ization: a paradigm example. J. Atmos. Sci. 69, 1080–1105(2012)

21. Frenkel, Y., Majda, J., Khouider, B.: Stochastic and deter-ministic multicloud parameterizations for tropical convec-tion. Clim. Dyn. 41, 1527–1551 (2013)

22. Giannakis, D., Majda, A.J.: Nonlinear Laplacian spec-tral analysis for time series with intermittency and low-frequency variability. Proc. Natl. Acad. Sci. USA 109,2222–2227 (2012)

23. Hasselmann, K.: Stochastic climate models. Part I, theory.Tellus 28, 473–485 (1976)

24. Hasselmann, K.: PIPs and POPs: the reduction of complexdynamical systems using principal interaction and oscilla-tion patterns. J. Geophys. Res. 93(D9), 11015–11021 (1988)

25. Horenko, I.: Nonstationarity in multifactor models of dis-crete jump processes, memory, and application to cloudmodeling. J. Atmos. Sci. 68, 1493–1506 (2011)

26. Imkeller P., von Storch J.-S. (eds.): Stochastic ClimateModels. Birkhauser, Basel (2001)

27. Jolliffe, I.T.: Principal Component Analysis, 2nd edn.Springer, New York (2002)

28. Katsoulakis, M., Majda, A.J., Vlachos, D.: Coarse-Grainedstochastic processes and Monte-Carlo simulations in latticesystems. J. Comp. Phys. 186, 250–278 (2003)

29. Katz, R.W., Naveau P.: Editorial: special issue on statisticsof extremes in weather and climate. Extremes 13, 107–108(2010)

30. Kessler, M., Lindner, A., Sørensen, M. (eds.): StatisticalMethods for Stochastic Differential Equations. CRC, BocaRaton (2012)

31. Khouider, B.: Markov-jump stochastic models for orga-nized tropical convection. In: Yang, X.-S. (ed.) Mathemat-ical Modeling With Multidisciplinary Applications. Wiley,(2013). ISBN: 978-1-1182-9441-3

32. Khouider, B.: A stochastic coarse grained multi-type par-ticle interacting model for tropical convection. Commun.Math. Sci. 12, 1379–1407 (2014)

33. Khouider, B., Biello, J., Majda, A.J.: A stochastic multi-cloud model for tropical convection. Commun. Math. Sci.8, 187–216 (2010)

34. Khouider, B., Majda, A.J., Katsoulakis, M.: Coarse grainedstochastic models for tropical convection. Proc. Natl. Acad.Sci. USA 100, 11941–11946 (2003)

35. Khouider, B., Majda, A.J., Stechmann, S.: Climate sciencein the tropics: waves, vortices, and PDEs. Nonlinearity 26,R1-R68 (2013)

36. Kleeman, R., Moore, A.: A theory for the limitation ofENSO predictability due to stochastic atmospheric tran-sients. J. Atmos. Sci. 54, 753–767 (1997)

37. Kondrasov, D., Krastov, S., Gill, M.: Empirical mode reduc-tion in a model of extra-tropical low-frequency variability.J. Atmos. Sci. 63, 1859–1877 (2006)

1386 Stochastic Eulerian-Lagrangian Methods

38. Lin, J.B.W., Neelin, D.: Toward stochastic deep convectiveparameterization in general circulation models. Geophy.Res. Lett. 27, 3691–3694 (2000)

39. Lorenz, E.N.: Predictability: a problem partly solved. In:Proceedings, Seminar on Predictability ECMWF, Reading,vol. 1, pp. 1–18 (1996)

40. Majda, A., Franzke, C., Crommelin, D.: Normal forms forreduced stochastic climate models. Proc. Natl. Acad. Sci.USA 16, 3649–3653 (2009)

41. Majda, A.J., Franzke, C.L., Fischer, A., Crommelin, D.T.:Distinct metastable atmospheric regimes despite nearlyGaussian statistics: a paradigm model. Proc. Natl. Acad. Sci.USA 103, 8309–8314 (2006)

42. Majda, A.J., Franzke, C.L., Khouider, B.: An applied mathe-matics perspective on stochastic modelling for climate. Phil.Trans. R. Soc. 366, 2427–2453 (2008)

43. Majda, A.J., Khouider, B.: Stochastic and mesoscopic mod-els for tropical convection. Proc. Natl. Acad. Sci. 99,1123–1128 (2002)

44. Majda, A.J., Timofeyev, I., Vanden-Eijnden, E.: Models forstochastic climate prediction. Proc. Natl. Acad. Sci. 96,14687–14691 (1999)

45. Majda, A.J., Timofeyev, I., Vanden-Eijnden, E.: A mathe-matical framework for stochastic climate models. Commun.Pure Appl. Math. LIV, 891–974 (2001)

46. Majda, A.J., Timofeyev, I., Vanden-Eijnden, E.: A prioritests of a stochastic mode reduction strategy. Physica D 170,206–252 (2002)

47. Majda A., Timofeyev, I., Vanden-Eijnden, E.: Stochasticmodels for selected slow variables in large deterministicsystems. Nonlinearity 19, 769–794 (2006)

48. Mo, K.C., Ghil, M.: Statistics and dynamics of persistentanomalies. J. Atmos. Sci. 44, 877–901 (1987)

49. Monahan, A.H.: Nonlinear principal component analysisby neural networks: theory and application to the Lorenzsystem. J. Clim. 13, 821–835 (2000)

50. Monahan, A.H., Fyfe, J.C., Ambaum, M.H.P., Stephenson,D.B., North, G.R.: Empirical orthogonal functions: themedium is the message. J. Clim. 22, 6501–6514 (2009)

51. Newman, M., Sardeshmukh, P., Penland, C.: Stochasticforcing of the wintertime extratropical flow. J. Atmos. Sci.54, 435–455 (1997)

52. Palmer, T., Williams, P. (eds.): Stochastic Physics and Cli-mate Modelling. Cambridge University Press, Cambridge(2009)

53. Pasmanter, R.A., Timmermann, A.: Cyclic Markov chainswith an application to an intermediate ENSO model. Non-linear Process. Geophys. 10, 197–210 (2003)

54. Penland, C., Ghil, M.: Forecasting northern hemisphere700-mb geopotential height anomalies using empirical nor-mal modes. Mon. Weather Rev. 121, 2355–2372 (1993)

55. Penland, C., Sardeshmukh, P.D.: The optimal growth oftropical sea surface temperature anomalies. J. Clim. 8,1999–2024 (1995)

56. Peters, K., Jakob, C., Davies, L., Khouider, B., Majda, A.:Stochastic behaviour of tropical convection in observationsand a multicloud model. J. Atmos. Sci. 70, 3556–3575(2013)

57. Plant, R.S., Craig, G.C.: A stochastic parameterization fordeep convection based on equilibrium statistics. J. Atmos.Sci. 65, 87–105 (2008)

58. Prakasa Rao, B.L.S.: Statistical Inference for Diffusion TypeProcesses. Arnold Publishers, London (1999)

59. Preisendorfer, R.W.: Principal Component Analysis in Me-teorology and Oceanography. Elsevier, Amsterdam (1988)

60. Sura, P., Newman, M., Penland, C., Sardeshmukh, P.: Mul-tiplicative noise and non-Gaussianity: a paradigm for atmo-spheric regimes. J. Atmos. Sci. 62, 1391–1409 (2005)

61. Von Storch, H., Zwiers, F.W.: Statistical Analysis in ClimateResearch. Cambridge University Press, Cambridge (1999)

62. Wilks, D.S.: Statistical Methods in the Atmospheric Sci-ences, 3rd edn., p. 704. Academic, Oxford (2011)

63. Zucchini, W., Guttorp, P.: A Hidden Markov model forspace-time precipitation. Water Resour. Res. 27, 1917–1923(1991)

Stochastic Eulerian-Lagrangian Methods

Paul J. AtzbergerDepartment of Mathematics, University of CaliforniaSanta Barbara (UCSB), Santa Barbara, CA, USA

Synonyms

Fluid-structure interaction; Fluid dynamics; Fluctu-ating hydrodynamics; Immersed Boundary Method;SELM; Statistical mechanics; Stochastic Eulerian La-grangian method; Thermal fluctuations

Abstract

We present approaches for the study of fluid-structureinteractions subject to thermal fluctuations. A me-chanical description is utilized combining Eulerianand Lagrangian reference frames. We establish generalconditions for the derivation of operators couplingthese descriptions and for the derivation of stochasticdriving fields consistent with statistical mechanics. Wepresent stochastic numerical methods for the fluid-structure dynamics and methods to generate efficientlythe required stochastic driving fields. To help establishthe validity of the proposed approach, we performanalysis of the invariant probability distribution of thestochastic dynamics and relate our results to statisti-cal mechanics. Overall, the presented approaches areexpected to be applicable to a wide variety of systemsinvolving fluid-structure interactions subject to thermalfluctuations.

Stochastic Eulerian-Lagrangian Methods 1387

S

Introduction

Recent scientific and technological advances motivatethe study of fluid-structure interactions in physicalregimes often involving very small length and timescales [26, 30, 35, 36]. This includes the study ofmicrostructure in soft materials and complex fluids,the study of biological systems such as cell motil-ity and microorganism swimming, and the study ofprocesses within microfluidic and nanofluidic devices.At such scales thermal fluctuations play an importantrole and pose significant challenges in the study ofsuch fluid-structure systems. Significant past work hasbeen done on the formulation of descriptions for fluid-structure interactions subject to thermal fluctuations.To obtain descriptions tractable for analysis and nu-merical simulation, these approaches typically placean emphasis on approximations which retain only thestructure degrees of freedom (eliminating the fluiddynamics). This often results in simplifications in thedescriptions having substantial analytic and compu-tational advantages. In particular, this eliminates themany degrees of freedom associated with the fluidand avoids having to resolve the potentially intricateand stiff stochastic dynamics of the fluid. These ap-proaches have worked especially well for the studyof bulk phenomena in free solution and the studyof many types of complex fluids and soft materials[3, 3, 9, 13, 17, 23].

Recent applications arising in the sciences and intechnological fields present situations in which resolv-ing the dynamics of the fluid may be important andeven advantageous both for modeling and computation.This includes modeling the spectroscopic responsesof biological materials [19, 25, 37], studying trans-port in microfluidic and nanofluidic devices [16, 30],and investigating dynamics in biological systems [2,11]. There are also other motivations for represent-ing the fluid explicitly and resolving its stochasticdynamics. This includes the development of hybridfluid-particle models in which thermal fluctuationsmediate important effects when coupling continuumand particle descriptions [12, 14], the study of hy-drodynamic coupling and diffusion in the vicinity ofsurfaces having complicated geometries [30], and thestudy of systems in which there are many interactingmechanical structures [8, 27, 28]. To facilitate the de-velopment of methods for studying such phenomenain fluid-structure systems, we present a rather general

formalism which captures essential features of thecoupled stochastic dynamics of the fluid and struc-tures.

To model the fluid-structure system, a mechanicaldescription is utilized involving both Eulerian andLagrangian reference frames. Such mixed descriptionsarise rather naturally, since it is often convenient todescribe the structure configurations in a Lagrangianreference frame while it is convenient to describethe fluid in an Eulerian reference frame. In practice,this presents a number of challenges for analysisand numerical studies. A central issue concerns howto couple the descriptions to represent accuratelythe fluid-structure interactions, while obtaining acoupled description which can be treated efficientlyby numerical methods. Another important issueconcerns how to account properly for thermalfluctuations in such approximate descriptions. Thismust be done carefully to be consistent withstatistical mechanics. A third issue concerns thedevelopment of efficient computational methods.This requires discretizations of the stochasticdifferential equations and the development of efficientmethods for numerical integration and stochastic fieldgeneration.

We present a set of approaches to address theseissues. The formalism and general conditions for theoperators which couple the Eulerian and Lagrangiandescriptions are presented in section “StochasticEulerian Lagrangian Method.” We discuss a convenientdescription of the fluid-structure system usefulfor working with the formalism in practice insection “Derivations for the Stochastic EulerianLagrangian Method.” A derivation of the stochasticdriving fields used to represent the thermal fluctuationsis also presented in section “Derivations for theStochastic Eulerian Lagrangian Method.” Stochasticnumerical methods are discussed for the approximationof the stochastic dynamics and generation ofstochastic fields in sections “Computational Method-ology.” To validate the methodology, we performanalysis of the invariant probability distributionof the stochastic dynamics of the fluid-structureformalism. We compare this analysis with resultsfrom statistical mechanics in section “EquilibriumStatistical Mechanics of SELM Dynamics.” Amore detailed and comprehensive discussion of theapproaches presented here can be found in ourpaper [6].


Stochastic Eulerian LagrangianMethod

To study the dynamics of fluid-structure interactionssubject to thermal fluctuations, we utilize a mechanicaldescription involving Eulerian and Lagrangian refer-ence frames. Such mixed descriptions arise rather nat-urally, since it is often convenient to describe the struc-ture configurations in a Lagrangian reference framewhile it is convenient to describe the fluid in an Eu-lerian reference frame. In principle more general de-scriptions using other reference frames could also beconsidered. Descriptions for fluid-structure systemshaving these features can be described rather generallyby the following dynamic equations

�dudt

D Lu C�Œ$ .v � � u/�C �C fthm (1)

mdvdt

D �$ .v � � u/� rX˚ŒX�C � C Fthm (2)

dXdt

D v: (3)

The u denotes the velocity of the fluid, and � theuniform fluid density. The X denotes the configurationof the structure, and v the velocity of the structure.The mass of the structure is denoted bym. To simplifythe presentation, we treat here only the case when

� and m are constant, but with some modificationsthese could also be treated as variable. The �; � areLagrange multipliers for imposed constraints, such asincompressibility of the fluid or a rigid body constraintof a structure. The operator L is used to account fordissipation in the fluid, such as associated with New-tonian fluid stresses [1]. To account for how the fluidand structures are coupled, a few general operators areintroduced, �; $;�.

The linear operators �;�; $ are used to model thefluid-structure coupling. The � operator describes howa structure depends on the fluid flow, while �$ isa negative definite dissipative operator describing theviscous interactions coupling the structure to the fluid.We assume throughout that this dissipative operatoris symmetric, $ D $ T . The linear operator � isused to attribute a spatial location for the viscousinteractions between the structure and fluid. The linearoperators are assumed to have dependence only on theconfiguration degrees of freedom � D � ŒX�, � D�ŒX�. We assume further that $ does not have anydependence on X. For an illustration of the role thesecoupling operators play, see Fig. 1.

To account for the mechanics of structures, ˚ŒX�denotes the potential energy of the configuration X.The total energy associated with this fluid-structuresystem is given by

Stochastic Eulerian-Lagrangian Methods, Fig. 1 The de-scription of the fluid-structure system utilizes both Eulerian andLagrangian reference frames. The structure mechanics are oftenmost naturally described using a Lagrangian reference frame.The fluid mechanics are often most naturally described usingan Eulerian reference frame. The mapping X(q) relates theLagrangian reference frame to the Eulerian reference frame. The

operator � prescribes how structures are to be coupled to thefluid. The operator � prescribes how the fluid is to be coupledto the structures. A variety of fluid-structure interactions canbe represented in this way. This includes rigid and deformablebodies, membrane structures, polymeric structures, or pointparticles


S

EŒu; v;X� DZ˝

1

2�ju.y/j2dy

C 1

2mv2 C ˚ŒX�: (4)

The first two terms give the kinetic energy of the fluidand structures. The last term gives the potential energyof the structures.

As we shall discuss, it is natural to consider cou-pling operators� and � which are adjoint in the sense

ZS.� u/.q/ � v.q/dq D

Z˝

u.x/ � .�v/.x/dx (5)

for any u and v. The S and˝ denote the spaces used toparameterize respectively the structures and the fluid.We denote such an adjoint by � D � % or � D�%. This adjoint condition can be shown to have theimportant consequence that the fluid-structure couplingconserves energy when $ ! 1 in the inviscid andzero temperature limit.

To account for thermal fluctuations, a random forcedensity fthm is introduced in the fluid equations and Fthm

in the structure equations. These account for sponta-neous changes in the system momentum which occursas a result of the influence of unresolved microscopicdegrees of freedom and unresolved events occurring inthe fluid and in the fluid-structure interactions.

The thermal fluctuations consistent with the form ofthe total energy and relaxation dynamics of the systemare taken into account by the introduction of stochasticdriving fields in the momentum equations of the fluidand structures. The stochastic driving fields are takento be Gaussian processes with mean zero and with ı-correlation in time [29]. By the fluctuation-dissipationprinciple [29], these have covariances given by

hfthm.s/fTthm.t/i D � .2kBT / .L ��$� / ı.t � s/

(6)

hFthm.s/FTthm.t/i D .2kBT / $ ı.t � s/ (7)

hfthm.s/FTthm.t/i D � .2kBT /�$ ı.t � s/: (8)

We have used that � D �% and $ D $ T . We remarkthat the notation ghT which is used for the covarianceoperators should be interpreted as the tensor product.This notation is meant to suggest the analogue to theouter-product operation which holds in the discrete set-ting [5]. A more detailed discussion and derivation of

the thermal fluctuations is given in section “Derivationsfor the Stochastic Eulerian Lagrangian Method.”

It is important to mention that some care must betaken when using the above formalism in practice andwhen choosing operators. An important issue concernsthe treatment of the material derivative of the fluid,du=dt D @u=@t C u � ru. For stochastic systemsthe field u is often highly irregular and not defined ina point-wise sense, but rather only in the sense of ageneralized function (distribution) [10, 24]. To avoidthese issues, we shall treat du=dt D @u=@t in thisinitial presentation of the approach [6]. The SELM pro-vides a rather general framework for the study of fluid-structure interactions subject to thermal fluctuations.To use the approach for specific applications requiresthe formulation of appropriate coupling operators �and � to model the fluid-structure interaction. Weprovide some concrete examples of such operators inthe paper [6].

Formulation in Terms of Total Momentum FieldWhen working with the formalism in practice, it turnsout to be convenient to reformulate the description interms of a field describing the total momentum of thefluid-structure system at a given spatial location. As weshall discuss, this description results in simplificationsin the stochastic driving fields. For this purpose, wedefine

p.x; t/ D �u.x; t/C�Œmv.t/�.x/: (9)

The operator � is used to give the distribution inspace of the momentum associated with the structuresfor given configuration X.t/. Using this approach, thefluid-structure dynamics are described by

dpdt

DLu C�Œ�rX˚.X/�

C .rX�Œmv�/ � v C �C gthm (10)

mdvdt

D � $ .v � � u/� rX˚.X/C � C Fthm (11)

dXdt

Dv (12)

where u D ��1 .p ��Œmv�/ and gthm D fthm C�ŒFthm�. The third term in the first equation arisesfrom the dependence of � on the configuration ofthe structures, �Œmv� D .�ŒX�/Œmv�. The Lagrange


multipliers for imposed constraints are denoted by�; �. For the constraints, we use rather liberally thenotation with the Lagrange multipliers denoted herenot necessarily assumed to be equal to the previousdefinition. The stochastic driving fields are again Gaus-sian with mean zero and ı-correlation in time [29]. Thestochastic driving fields have the covariance structuregiven by

hgthm.s/gTthm.t/i D � .2kBT /L ı.t � s/ (13)

hFthm.s/FTthm.t/i D .2kBT / $ ı.t � s/ (14)

hgthm.s/FTthm.t/i D 0: (15)

This formulation has the convenient feature that thestochastic driving fields become independent. This isa consequence of using the field for the total momen-tum for which the dissipative exchange of momentumbetween the fluid and structure no longer arises. In theequations for the total momentum, the only source ofdissipation remaining occurs from the stresses of thefluid. This approach simplifies the effort required togenerate numerically the stochastic driving fields andwill be used throughout.

Derivations for the Stochastic EulerianLagrangian Method

We now discuss formal derivations to motivate thestochastic differential equations used in each ofthe physical regimes. For this purpose, we do notpresent the most general derivation of the equations.For brevity, we make simplifying assumptions whenconvenient.

In the initial formulation of SELM, the fluid-structure system is described by

�dudt

DLu C�Œ$ .v � � u/�C �C fthm (16)

mdvdt

D � $ .v � � u/� rX˚.X/C �

C Fthm (17)

dXdt

Dv: (18)

The notation and operators appearing in theseequations have been discussed in detail in section

“Stochastic Eulerian Lagrangian Method.” For theseequations, we focus primarily on the motivation forthe stochastic driving fields used for the fluid-structuresystem.

For the thermal fluctuations of the system, weassume Gaussian random fields with mean zero and ı-correlated in time. For such stochastic fields, the centralchallenge is to determine an appropriate covariancestructure. For this purpose, we use the fluctuation-dissipation principle of statistical mechanics [22, 29].For linear stochastic differential equations of theform

dZt D LZt dt CQdBt (19)

the fluctuation-dissipation principle can be expressedas

G D QQT D �.LC / � .LC /T : (20)

This relates the equilibrium covariance structure C ofthe system to the covariance structureG of the stochas-tic driving field. The operator L accounts for thedissipative dynamics of the system. For the Eqs. 16–18,the dissipative operators only appear in the momentumequations. This can be shown to have the consequencethat there is no thermal forcing in the equation for X.t/;this will also be confirmed in section “Formulation inTerms of Total Momentum Field.” To simplify the pre-sentation, we do not represent explicitly the stochasticdynamics of the structure configuration X.

For the fluid-structure system, it is convenient towork with the stochastic driving fields by defining

q D Œ��1fthm; m�1Fthm�

T : (21)

The field q formally is given by q D QdBt =dt and de-termined by the covariance structure G D QQT . Thiscovariance structure is determined by the fluctuation-dissipation principle expressed in Eq. 20 with

L D��1 .L ��$� / ��1�$m�1$ � �m�1$

(22)

C D��1kBT I 00 m�1kBT I

: (23)

The I denotes the identity operator. The covarianceC was obtained by considering the fluctuations atequilibrium. The covariance C is easily found since


S

the Gibbs-Boltzmann distribution is a Gaussian withformal density �.u; v/ D 1

Z0exp Œ�E=kBT �. The Z0

is the normalization constant for� . The energy is givenby Eq. 4. For this purpose, we need only consider theenergy E in the case when ˚ D 0. This gives thecovariance structure

G D .2kBT /

��2 .L ��$� / �m�1��1�$�m�1��1$ � m�2$

:

(24)

To obtain this result, we use that � D �% and $ D $ %.From the definition of q, it is found that the covarianceof the stochastic driving fields of SELM is given byEqs. 6–8. This provides a description of the thermalfluctuations in the fluid-structure system.

Formulation in Terms of Total Momentum FieldIt is convenient to reformulate the description of thefluid-structure system in terms of a field for the totalmomentum of the system associated with spatial loca-tion x. For this purpose we define

p.x; t/ D �u.x; t/C�Œmv.t/�.x/: (25)

The operator � is used to give the distribution inspace of the momentum associated with the structures.Using this approach, the fluid-structure dynamics aredescribed by

dpdt

DLu C�Œ�rX˚.X/�

C .rX�Œmv�/ � v C �C gthm (26)

mdvdt

D � $ .v � � u/ � rX˚.X/C �

C Fthm (27)

dXdt

Dv (28)

where u D ��1 .p ��Œmv�/ and gthm D fthm C�ŒFthm�.The third term in the first equation arises from thedependence of� on the configuration of the structures,�Œmv.t/� D .�ŒX�/Œmv.t/�.

The thermal fluctuations are taken into account bytwo stochastic fields gthm and Fthm. The covariance ofgthm is obtained from

hgthmgTthmi DhfthmfTthmi C hfthmFTthm�T i

C h�FthmfTthmi C h�FthmFTthm�T i

D.2kBT /� � L C�$�

��$�T ��$�T C�$�T�

D � .2kBT /L: (29)

This makes use of the adjoint property of the couplingoperators�% D � .

One particularly convenient feature of this reformu-lation is that the stochastic driving fields Fthm and gthm

become independent. This can be seen as follows:

hgthmFTthmi D hfthmFTthmi C h�FthmFTthmi (30)

D .2kBT /.��$ C�$ / D 0:

This decoupling of the stochastic driving fields greatlyreduces the computational effort to generate the fieldswith the required covariance structure. This shows thatthe covariance structure of the stochastic driving fieldsof SELM is given by Eqs. 13–15.

Computational Methodology

We now discuss briefly numerical methods for theSELM formalism. For concreteness we consider thespecific case in which the fluid is Newtonian andincompressible. For now, the other operators of theSELM formalism will be treated rather generally. Thiscase corresponds to the dissipative operator for thefluid

Lu D � u: (31)

The denotes the Laplacian u D @xxuC@[email protected] incompressibility of the fluid corresponds to theconstraint

r � u D 0: (32)

This is imposed by the Lagrange multiplier �. By theHodge decomposition, � is given by the gradient of afunction p with � D �rp. The p can be interpretedas the local pressure of the fluid.

A variety of methods could be used in practiceto discretize the SELM formalism, such as finite


difference methods, spectral methods, and finiteelement methods [20, 32, 33]. We present herediscretizations based on finite difference methods.

Numerical Semi-discretizations forIncompressible Newtonian FluidThe Laplacian will be approximated by central differ-ences on a uniform periodic lattice by

ŒLu�m D3X

jD1

umCej � 2um C um�ej

x2: (33)

The m D .m1;m2;m3/ denotes the index of the latticesite. The ej denotes the standard basis vector in threedimensions. The incompressibility of the fluid will beapproximated by imposing the constraint

ŒD � u�m D3X

jD1

ujmCej � ujm�ej

2 x: (34)

The superscripts denote the vector component. In prac-tice, this will be imposed by computing the projectionof a vector u� to the subspace fu 2 R

3N j D � u D 0g,where N is the total number of lattice sites. We denotethis projection operation by

u D }u�: (35)

The semi-discretized equations for SELM to be used inpractice are

dpdt

D Lu C�Œ�rX˚�C .rX�Œmv�/ � v C �C gthm

(36)

dvdt

D �$ Œv � � u�C Fthm (37)

dXdt

D v: (38)

The component um D ��1.pm � �Œmv�m/. Eachof the operators now appearing is understood to bediscretized. We discuss specific discretizations for �and� in paper [6]. To obtain the Lagrange multiplier �which imposes incompressibility, we use the projectionoperator and

� D �.I � }/ .Lu C $ Œv � � u�C fthm/ (39)

In this expression, we let fthm D gthm � �ŒFthm� for theparticular realized values of the fields gthm and Fthm.

We remark that in fact the semi-discretized equa-tions of the SELM formalism in this regime can alsobe given in terms of u directly, which may providea simpler approach in practice. The identity fthm Dgthm � �ŒFthm� could be used to efficiently generate therequired stochastic driving fields in the equations foru. We present the reformulation here, since it moredirectly suggests the semi-discretized equations to beused for the reduced stochastic equations.

For this semi-discretization, we consider a totalenergy for the system given by

EŒu; v;X� D �

2

Xm

ju.xm/j2 x3m C m

2jvj2 C ˚ŒX�:

(40)

This is useful in formulating an adjoint condition 5 forthe semi-discretized system. This can be derived byconsidering the requirements on the coupling operators� and � which ensure the energy is conserved when$ ! 1 in the inviscid and zero temperature limit.

To obtain appropriate behaviors for the thermalfluctuations, it is important to develop stochastic driv-ing fields which are tailored to the specific semi-discretizations used in the numerical methods. Oncethe stochastic driving fields are determined, which isthe subject of the next section, the equations can beintegrated in time using traditional methods for SDEs,such as the Euler-Maruyama method or a stochasticRunge-Kutta method [21]. More sophisticated inte-grators in time can also be developed to cope withsources of stiffness but are beyond the scope of thisentry [7]. For each of the reduced equations, similarsemi-discretizations can be developed as the one pre-sented above.

Stochastic Driving Fields forSemi-discretizationsTo obtain behaviors consistent with statistical mechan-ics, it is important stochastic driving fields be usedwhich are tailored to the specific numerical discretiza-tion employed [5–7, 15]. To ensure consistency withstatistical mechanics, we will again use the fluctuation-dissipation principle but now apply it to the semi-discretized equations. For each regime, we then discussthe important issues arising in practice concerning theefficient generation of these stochastic driving fields.


S

Formulation in Terms of Total Momentum FieldTo obtain the covariance structure for this regime, weapply the fluctuation-dissipation principle as expressedin Eq. 20 to the semi-discretized Eqs. 36–38. This givesthe covariance

G D �2LC D .2kBT /

24��2 x�3L 0 0

0 m�2$ 0

0 0 0

35 :(41)

The factor of x�3 arises from the form of the energyfor the discretized system which gives covariance forthe equilibrium fluctuations of the total momentum��1 x�3kBT ; see Eq. 40. In practice, achieving thecovariance associated with the dissipative operator ofthe fluidL is typically the most challenging to generateefficiently. This arises from the large number N oflattice sites in the discretization.

One approach is to determine a factor Q such thatthe block Gp;p D QQT ; subscripts indicate blockentry of the matrix. The required random field withcovariance Gp;p is then given by g D Q�, where� is the uncorrelated Gaussian field with the covari-ance structure I. For the discretization used on theuniform periodic mesh, the matrices L and C arecyclic [31]. This has the important consequence thatthey are both diagonalizable in the discrete Fourierbasis of the lattice. As a result, the field fthm can begenerated using the fast Fourier transform (FFT) withat most O.N log.N // computational steps. In fact, inthis special case of the discretization, “random fluxes”at the cell faces can be used to generate the field inO.N/ computational steps [5]. Other approaches canbe used to generate the random fields on nonperiodicmeshes and on multilevel meshes; see [4, 5].

Equilibrium Statistical Mechanics of SELMDynamics

We now discuss how the SELM formalism and thepresented numerical methods capture the equilibriumstatistical mechanics of the fluid-structure system. Thisis done through an analysis of the invariant probabilitydistribution of the stochastic dynamics. For the fluid-structure systems considered, the appropriate proba-bility distribution is given by the Gibbs-Boltzmanndistribution

�GB.z/ D 1

Zexp Œ�E.z/=kBT � : (42)

The z is the state of the system, E is the energy,kB is Boltzmann’s constant, T is the system tem-perature, and Z is a normalization constant for thedistribution [29]. We show that this Gibbs-Boltzmanndistribution is the equilibrium distribution of both thefull stochastic dynamics and the reduced stochasticdynamics in each physical regime.

We present here both a verification of the invarianceof the Gibbs-Boltzmann distribution for the generalformalism and for numerical discretizations of theformalism. The verification is rather formal for theundiscretized formalism given technical issues whichwould need to be addressed for such an infinite dimen-sional dynamical system. However, the verification isrigorous for the semi-discretization of the formalism,which yields a finite dimensional dynamical system.The latter is likely the most relevant case in practice.Given the nearly identical calculations involved in theverification for the general formalism and its semi-discretizations, we use a notation in which the keydifferences between the two cases primarily arise inthe definition of the energy. In particular, the energy isunderstood to be given by Eq. 4 when considering thegeneral SELM formalism and Eq. 40 when consideringsemi-discretizations.

Formulation in Terms of Total Momentum FieldThe stochastic dynamics given by Eqs. 10–12 is achange of variable of the full stochastic dynamics ofthe SELM formalism given by Eqs. 1–3. Thus verifyingthe invariance using the reformulated description isalso applicable to Eqs. 1–3 and vice versa. To verifythe invariance in the other regimes, it is convenient towork with the reformulated description. The energy as-sociated with the reformulated description is given by

EŒp; v;X� D 1

2�

Z˝

jp.y/��Œmv�.y/j2dy

C m

2jvj2 C˚ŒX�: (43)

The energy associated with the semi-discretization is

EŒp; v;X� D 1

2�

Xm

jp.xm/ ��Œmv�mj2 x3m (44)

C m

2jvj2 C ˚ŒX�:


The probability density �.p; v;X; t/ for the currentstate of the system under the SELM dynamics isgoverned by the Fokker-Planck equation

@�

@tD �r � J (45)

with probability flux

J D24L C�C rX� � v C �

�$ � rX˚ C �

v

35�

� 1

2.r �G/� � 1

2Gr�: (46)

The covariance operatorG is associated with the Gaus-sian field gD Œgthm;Fthm; 0�

T byhg.s/gT .t/iDGı.t � s/.

In this regime, G is given by Eq. 13 or 41. In thenotation Œr � G.z/�i D @zj Gij .z/ with the summationconvention for repeated indices. To simplify thenotation, we have suppressed denoting the specificfunctions on which each of the operators acts; seeEqs. 10–12 for these details.

The requirement that the Gibbs-Boltzmann distri-bution �GB given by Eq. 42 be invariant under thestochastic dynamics is equivalent to the distributionyielding r �J D 0. We find it convenient to group termsand express this condition as

r � J D A1 C A2 C r � A3 C r � A4 D 0 (47)

where

A1 D �.�C rX� � v C �1/ � rpE C .�rX˚ C �1/ � rvE C .v/ � rXE

�.�kBT /�1�GB

A2 D �rp � .�C rX� � v C �1/C rv � .�rX˚ C �2/C rX � .v/��GB

A3 D �12.r �G/�GB

A4 D24Lu C �2 C �

GpprpE CGpvrvE CGpXrXE�.2kBT /

�1�$ C �2 C �

GvprpE CGvvrvE CGvXrXE�.2kBT /

�1�GXprpE CGXvrvE CGXXrXE

�.2kBT /

�1

35�GB: (48)

We assume here that the Lagrange multipliers can besplit � D �1 C �2 and � D �1 C �2 to impose theconstraints by considering in isolation different termscontributing to the dynamics; see Eq. 48. This is alwayspossible for linear constraints. The block entries ofthe covariance operator G are denoted by Gi;j withi; j 2 fp; v;Xg. For the energy of the discretizedsystem given by Eq. 4, we have

rpnE D u.xn/ x3n (49)

rvqE DX

m

u.xm/ � ��rvq�Œmv�m� x3m Cmvq

(50)

rXqE DX

m

u.xm/ � ��rXq�Œmv�m� x3m C rXq˚:

(51)

where u D ��1.p��Œmv�/. Similar expressions for theenergy of the undiscretized formalism can be obtainedby using the calculus of variations [18].

We now consider r�J and each termA1;A2;A3;A4.The term A1 can be shown to be the time derivativeof the energy A1 D dE=dt when considering only asubset of the contributions to the dynamics. Thus, con-servation of the energy under this restricted dynamicswould result in A1 being zero. For the SELM formal-ism, we find by direct substitution of the gradients ofE given by Eqs. 49–51 into Eq. 48 that A1 D 0. Whenthere are constraints, it is important to consider onlyadmissible states .p; v;X/. This shows in the inviscidand zero temperature limit of SELM, the resulting dy-namics are nondissipative. This property imposes con-straints on the coupling operators and can be viewed asa further motivation for the adjoint conditions imposedin Eq. 5.


S

The term A2 gives the compressibility of the phase-space flow generated by the nondissipative dynamicsof the SELM formalism. The flow is generated by thevector field .� C rX� � v C �1; �rX˚ C �1; v/ onthe phase-space .p; v;X/. When this term is nonzero,there are important implications for the Liouville the-orem and statistical mechanics of the system [34].For the current regime, we have A2 D 0 since inthe divergence each component of the vector field isseen to be independent of the variable on which thederivative is computed. This shows in the inviscid andzero temperature limit of SELM, the phase-space flowis incompressible. For the reduced SELM descriptions,we shall see this is not always the case.

The term A3 corresponds to fluxes arising frommultiplicative features of the stochastic driving fields.When the covariance G has a dependence on thecurrent state of the system, this can result in pos-sible changes in the amplitude and correlations inthe fluctuations. These changes can yield asymmetriesin the stochastic dynamics which manifest as a netprobability flux. In the SELM formalism, it is foundthat in the divergence of G, each contributing entry isindependent of the variable on which the derivative isbeing computed. This shows for the SELM dynamicsthere is no such probability fluxes, A3 D 0.

The last term A4 accounts for the fluxes arising fromthe primarily dissipative dynamics and the stochasticdriving fields. This term is calculated by substitutingthe gradients of the energy given by Eqs. 49–51 and us-ing the choice of covariance structure given by Eq. 13or 41. By direct substitution this term is found to bezero, A4 D 0.

This shows the invariance of the Gibbs-Boltzmanndistribution under the SELM dynamics. This providesa rather strong validation of the stochastic drivingfields introduced for the SELM formalism. This showsthe SELM stochastic dynamics are consistent withequilibrium statistical mechanics [29].

Conclusions

An approach for fluid-structure interactions subjectto thermal fluctuations was presented based on amechanical description utilizing both Eulerian andLagrangian reference frames. General conditions wereestablished for operators coupling these descriptions.A reformulated description was presented for the

stochastic dynamics of the fluid-structure systemhaving convenient features for analysis and forcomputational methods. Analysis was presentedestablishing for the SELM stochastic dynamics thatthe Gibbs-Boltzmann distribution is invariant. TheSELM formalism provides a general frameworkfor the development of computational methods forapplications requiring a consistent treatment ofstructure elastic mechanics, hydrodynamic coupling,and thermal fluctuations. A more detailed andcomprehensive discussion of SELM can be foundin our paper [6].

Acknowledgements The author P. J. A. acknowledges supportfrom research grant NSF CAREER DMS - 0956210.

References

1. Acheson, D.J.: Elementary Fluid Dynamics. Oxford Ap-plied Mathematics and Computing Science Series. OxfordUniversity Press, Oxford/Clarendon Press/New York (1990)

2. Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts,K., Walker, P.: Molecular Biology of the Cell. GarlandPublishing, New York (2002)

3. Armstrong, R.C., Byron Bird, R., Hassager, O.:DynamicPolymeric Liquids, Vols. I, II. Wiley, New York (1987)

4. Atzberger, P.: Spatially adaptive stochastic multigrid meth-ods for fluid-structure systems with thermal fluctuations,technical report (2010). arXiv:1003.2680

5. Atzberger, P.J.: Spatially adaptive stochastic numericalmethods for intrinsic fluctuations in reaction-diffusion sys-tems. J. Comput. Phys. 229, 3474–3501 (2010)

6. Atzberger, P.J.: Stochastic eulerian lagrangian methodsfor fluid-structure interactions with thermal fluctuations. J.Comput. Phys. 230, 2821–2837 (2011)

7. Atzberger, P.J., Kramer, P.R., Peskin, C.S.: A stochastic im-mersed boundary method for fluid-structure dynamics at mi-croscopic length scales. J. Comput. Phys. 224, 1255–1292(2007)

8. Banchio, A.J., Brady, J.F.: Accelerated stokesian dynamics:Brownian motion. J. Chem. Phys. 118, 10323–10332 (2003)

9. Brady, J.F. Bossis, G.: Stokesian dynamics. Annu. Rev.Fluid Mech. 20, 111–157 (1988)

10. Da Prato, G., Zabczyk, J.: Stochastic Equations in InfiniteDimensions. Cambridge University Press, Cambridge/NewYork (1992)

11. Danuser, G., Waterman-Storer, C.M: Quantitative fluores-cent speckle microscopy of cytoskeleton dynamics. Annu.Rev. Biophys. Biomol. Struct. 35, 361–387 (2006)

12. De Fabritiis, G., Serrano, M., Delgado-Buscalioni, R.,Coveney, P.V.: Fluctuating hydrodynamic modeling of fluidsat the nanoscale. Phys. Rev. E 75, 026307 (2007)

13. Doi, M., Edwards, S.F.: The Theory of Polymer Dynamics.Oxford University Press, New York (1986)

14. Donev, A., Bell, J.B., Garcia, A.L., Alder, B.J.: A hybridparticle-continuum method for hydrodynamics of complexfluids. SIAM J. Multiscale Model. Simul. 8 871–911 (2010)

1396 Stochastic Filtering

15. Donev, A., Vanden-Eijnden, E., Garcia, A.L., Bell, J.B.:On the accuracy of finite-volume schemes for fluctuatinghydrodynamics, Commun. Appl. Math. Comput. Sci. 5(2),149–197 (2010)

16. Eijkel, J.C.T., Napoli, M., Pennathur, S.: Nanofluidic tech-nology for biomolecule applications: a critical review. Labon a Chip 10, 957–985 (2010)

17. Ermak, D.L. McCammon, J.A.: Brownian dynamics withhydrodynamic interactions. J. Chem. Phys. 69, 1352–1360(1978)

18. Gelfand, I.M., Fomin, S.V.: Calculus of Variations. Dover,Mineola (2000)

19. Gotter, R., Kroy, K., Frey, E., Barmann, M., Sackmann,E.: Dynamic light scattering from semidilute actin solu-tions: a study of hydrodynamic screening, filament bendingstiffness, and the effect of tropomyosin/troponin-binding.Macromolecules 29 30–36 (1996)

20. Gottlieb, D., Orszag, S.A.: Numerical Analysis of Spec-tral Methods Theory and Applications. SIAM, Philadelphia(1993)

21. Kloeden, P.E., Platen, E.: Numerical Solution of StochasticDifferential Equations. Springer, Berlin/New York (1992)

22. Landau, L.D., Lifshitz, E.M.: Course of Theoretical Physics.Statistical Physics, vol. 9. Pergamon Press, Oxford (1980)

23. Larson, R.G.: The Structure and Rheology of ComplexFluids. Oxford University Press, New York (1999)

24. Lieb, E.H., Loss, M.: Analysis. American MathematicalSociety, Providence (2001)

25. Mezei, F., Pappas, C., Gutberlet, T.: Neutron Spin EchoSpectroscopy: Basics, Trends, and Applications. Spinger,Berlin/New York (2003)

26. Moffitt, J.R., Chemla, Y.R., Smith, S.B., Bustamante, C.:Recent advances in optical tweezers. Annu. Rev. Biochem.77, 205–228 (2008)

27. Peskin, C.S.: Numerical analysis of blood flow in the heart.J. Comput. Phys. 25, 220–252 (1977)

28. Peskin, C.S.: The immersed boundary method. Acta Numer-ica 11, 479–517 (2002)

29. Reichl, L.E.: A Modern Course in Statistical Physics. Wiley,New York (1998)

30. Squires, T.M., Quake, S.R.: Microfluidics: fluid physics atthe nanoliter scale. Rev. Mod. Phys. 77, 977–1026 (2005)

31. Strang, G.: Linear Algebra and Its Applications. HarcourtBrace Jovanovich College Publishers, San Diego (1988)

32. Strang, G., Fix, G.: An Analysis of the Finite ElementMethod. Wellesley-Cambridge Press, Wellesley (2008)

33. Strikwerda, J.C.: Finite Difference Schemes and PartialDifferential Equations. SIAM, Philadelphia (2004)

34. Tuckerman, M.E., Mundy, C.J., Martyna, G.J.: On the clas-sical statistical mechanics of non-hamiltonian systems. EPL(Europhys. Lett.) 45, 149–155 (1999)

35. Valentine, M.T., Weeks, E.R., Gisler, T., Kaplan, P.D., Yodh,A.G., Crocker, J.C., Weitz, D.A.: Two-point microrheol-ogy of inhomogeneous soft materials. Phys. Rev. Lett. 85,888–91 (2000)

36. Watari, N., Doi, M., Larson, R.G.: Fluidic trapping of de-formable polymers in microflows. Phys. Rev. E 78, 011801(2008)

37. Watson, M.C., Brown, F.L.H.: Interpreting membrane scat-tering experiments at the mesoscale: the contribution of dis-sipation within the bilayer. Biophys. J. 98, L9–L11 (2010)

Stochastic Filtering

M.V. TretyakovSchool of Mathematical Sciences, University ofNottingham, Nottingham, UK


60G35; 93E11; 93E10; 62M20; 60H15; 65C30;65C35; 60H35

Synonyms

Filtering problem for hidden Markov models; Nonlin-ear filtering

Short Definition

Let T1 and T2 be either a time interval Œ0; T � or adiscrete set. The stochastic filtering problem consistsin estimating an unobservable signalXt ; t 2 T1; basedon an observation fys; s � t; s 2 T2g; where theprocess yt is related to Xt via a stochastic model.

Description

We restrict ourselves to the case when an unobservablesignal and observation are continuous time processeswith T1 D T2 D Œ0; T � (see discrete filtering in,e.g., [1, 3, 7]). Let .&;F ;P/ be a complete probabilityspace, Ft ; 0 � t � T; be a filtration satisfyingthe usual hypotheses, and .wt ;Ft / and .vt ;Ft / be d1-dimensional and r-dimensional independent standardWiener processes, respectively. We consider the classi-cal filtering scheme in which the unobservable signalprocess (“hidden” state) Xt 2 R

d and the observationprocess yt 2 R

r satisfy the system of Ito stochasticdifferential equations (SDE):

dX D ˛.X/ds C �.X/dws C �.X/dvs;

X0 D x; (1)

dy D ˇ.X/ds C dvs; y0 D 0; (2)

Stochastic Filtering 1397

S

where ˛.x/ and ˇ.x/ are d -dimensional and r-dimensional vector functions, respectively, and �.x/and �.x/ are d�d1-dimensional and d�r-dimensionalmatrix functions, respectively. The vector X0 D x inthe initial condition for (1) is usually random (i.e.,uncertain), it is assumed to be independent of both wand v, and its density '.�/ is assumed to be known.

Let f .x/ be a function on Rd . We assume that the

coefficients in (1), (2) and the function f are boundedand have bounded derivatives up to some order. Thestochastic filtering problem consists in constructingthe estimate Of .Xt / based on the observation ys; 0 �s � t; which is the best in the mean-square sense,i.e., the problem amounts to computing the conditionalexpectation:

�t Œf � D Of .Xt / D E .f .Xt / j ys; 0 � s � t/

D W Eyf .Xt /: (3)

Applications of nonlinear filtering include track-ing, navigation systems, cryptography, image process-ing, weather forecasting, financial engineering, speechrecognition, and many others (see, e.g., [2,11] and ref-erences therein). For a historical account, see, e.g., [1].

Optimal Filter EquationsIn this section we give a number of expressions for theoptimal filter which involve solving some stochasticevolution equations. Proofs of the results presented inthis section and their detailed exposition and exten-sions are available, e.g., in [1, 4, 6–11].

The solution �t Œf � to the filtering problem (3), (1)–(2) satisfies the nonlinear stochastic evolution equa-tion:

d�t Œf � D �t ŒLf �dt C ��t ŒM>f � � �t Œf ��t Œˇ

>��

.dy � �t Œˇ�dt/ ; (4)

where L is the generator for the diffusion processXt :

Lf WD 1

2

dXi;jD1

aij@2f

@xi@xjC

dXiD1

˛i@f

@xi

with a D faij g being a d � d -dimensional matrix de-fined by a.x/ D �.x/�>.x/ C �.x/�>.x/ and M D.M1; : : : ;Mr /

> is a vector of the operators Mj f WDPdiD1 �ij

@f

@xiCˇj f: The equation of optimal nonlinear

filtering (4) is usually called the Kushner-Stratonovichequation or the Fujisaki-Kallianpur-Kunita equation. Ifthe conditional measure E.I.X.t/ 2 dx/ j ys; 0 �s � t/ has a smooth density �.t; x/ with respect tothe Lebesgue measure, then it solves the followingnonlinear stochastic equation:

d�.t; x/ D L��.t; x/dt C .M� � �t Œˇ�/>�.t; x/

.dy � �t Œˇ�dt/ ; �.0; x/ D '.x/; (5)

where L� is an adjoint operator to L W L�f WD12

Pdi;jD1 @2

@xi @xj

�aij f

� � PdiD1 @

@xi.˛if / and M� is

an adjoint operator to M W M� D .M�1 ; : : : ;M�

r />

with M�j f D �Pd

iD1 @@xi

��ij f

�Cˇj f:We note that�t Œf � D R

Rdf .x/�.t; x/dx: We also remark that the

process Nvt WD yt � R t0�sŒˇ�ds is called the innovation

process.Now we will give another expression for the optimal

filter. Let

�t W D exp

Z t

0

ˇ>.Xs/dvs C 1

2

Z t

0

ˇ2.Xs/ds

�

D exp

Z t

0

ˇ>.Xs/dys � 1

2

Z t

0

ˇ2.Xs/ds

�:

According to our assumptions, we have E��1t D 1;

0 � t � T: We introduce the new probabilitymeasure QP on .&;F/ W QP.�/ D R

� ��1T dP.!/: The

measures P and QP are mutually absolutely continuous.Due to the Girsanov theorem, ys is a Wiener process on.&;F ;Ft ; QP/; the processesXs and ys are independenton .&;F ;Fs; QP/; and the process Xs satisfies the ItoSDE

dX D .˛.X/ � �.X/ˇ.X// ds C �.X/dws

C�.X/dys; X0 D x: (6)

Due to the Kallianpur-Striebel formula (a particularcase of the general Bayes formula [7, 9]) for theconditional mean (3), we have

�t Œf �DQE .f .Xt /�t j ys; 0 � s � t/

QE .�t j ys; 0 � s � t/D

QEy .f .Xt /�t /QEy�t

;

(7)where Xt is from (6), QE means expectation accordingto the measure QP; and QEy .�/ WD QE .� j ys; 0 � s � t/ :

Let�t Œg� WD QEy .g.Xt /�t / ;

1398 Stochastic Filtering

where g is a scalar function on Rd : The process �t is

often called the unnormalized optimal filter. It satisfiesthe linear evolution equation

d�t Œg� D �t ŒLg�dt C �t ŒM>g�dyt ; �0Œg� D �0Œg�;

(8)

which is known as the Zakai equation. Assumingthat there is the corresponding smooth unnormalizedfiltering density �.t; x/, it solves the linear stochasticpartial differential equation (SPDE) of parabolic type:

d�.t; x/ D L��.t; x/dt C �M��.t; x/

�>dyt ;

�.0; x/ D '.x/: (9)

We note that �t Œg� D RRdg.x/�.t; x/dx:

The unnormalized optimal filter can also be foundas a solution of a backward SPDE. To this end, let usfix a time moment t and introduce the function

ug.s; xI t/ D QEyg.X

s;xt /�

s;x;1t

�; (10)

where x 2 Rd is deterministic and Xs;x

s0 ; �s;x;1s0 ; s0 � s;

is the solution of the Ito SDE

dX D .˛.X/ � �.X/ˇ.X//ds0 C �.X/dws0

C�.X/dys0 ; Xs D x;

d� D ˇ>.X/�dys0 ; �s D 1:

The function ug.s; xI t/; s � t; is the solution of theCauchy problem for the backward linear SPDE:

� du D Luds C M>u dys; u.t; x/ D g.x/: (11)

The notation “ dy” means backward Ito integral [5,9].If X0 D x D is a random variable with the density'.�/; we can write

�t Œf � D uf;'.0; t/

u1;' .0; t/; (12)

where ug;'.0; t/ WD RRd

ug.0; xI t/'.x/dx DQEyg.X

0;t /�

0;;1t

�D �t Œg�:

Generally, numerical methods are required to solveoptimal filtering equations. For an overview of variousnumerical approximations for the nonlinear filteringproblem, see [1, Chap. 8] together with references

therein and for a number of recent developmentssee [2].

Linear Filtering and Kalman-Bucy FilterThere are a very few cases when explicit formulas foroptimal filters are available [1, 7]. The most notablecase is when the filtering problem is linear. Considerthe system of linear SDE:

dX D .as C AsX/ds CQsdws CGsdvs;

X0 D x; (13)

dy D .bs C BsX/ ds C dvs; y0 D 0; (14)

where As; Bs; Qs; and Gs are deterministic matrixfunctions of time s having the appropriate dimensions;as and bs are deterministic vector functions of time shaving the appropriate dimensions; the initial conditionX0 D x is a Gaussian random vector with meanM0 2 R

d and covariance matrix C0 2 Rd � R

d andit is independent of both w and v; the other notation isas in (1) and (2).

We note that the solutionXt; yt of the SDE (13) and(14) is a Gaussian process. The conditional distributionof Xt given fys; 0 � s � tg is Gaussian with mean OXtand covariance Pt ; which satisfy the following systemof differential equations [1, 7, 10, 11]:

d OX Das CAs OX

�ds C �

Gs C PB>s

�

.dys � .bs C Bs OX/ds/; OX0 D M0; (15)

d

dtP D PA>

s C AsP � �Gs C PB>

s

� �Gs C PB>

s

�>

CQsQ>s CGsG

>s ; P0 D C0: (16)

The solution OXt ; Pt is called the Kalman-Bucy filter(or linear quadratic estimation). We remark that (15)for the conditional mean OXt D E.Xt j ys; 0 � s � t/

is a linear SDE, while the solution Pt of the matrixRiccati equation (16) is deterministic and it can be pre-computed off-line. Online updating of OXt with arrivalof new data yt from observations is computationallyvery cheap, and the Kalman-Bucy filter and its variousmodifications are widely used in practical applications.

Stochastic ODEs 1399

S

References

1. Bain, A., Crisan, D.: Fundamentals of Stochastic Filtering.Springer, New York/London (2008)

2. Crisan, D., Rozovskii, B. (eds.): Handbook of NonlinearFiltering. Oxford University Press, Oxford (2011)

3. Fristedt, B., Jain, N., Krylov, N.: Filtering and Prediction:A Primer. AMS, Providence (2007)

4. Kallianpur, G.: Stochastic Filtering Theory. Springer,New York (1980)

5. Kunita, H.: Stochastic Flows and Stochastic DifferentialEquations. Cambridge University Press, Cambridge/NewYork (1990)

6. Kushner, H.J.: Probability Methods for Approximations inStochastic Control and for Elliptic Equations. Academic,New York (1977)

7. Liptser, R.S., Shiryaev, A.N.: Statistics of RandomProcesses. Springer, New York (1977)

8. Pardoux, E.: Equations du filtrage non lineaire de laprediction et du lissage. Stochastics 6, 193–231 (1982)

9. Rozovskii, B.L.: Stochastic Evolution Systems, LinearTheory and Application to Nonlinear Filtering. KluwerAcademic, Dordrecht/Boston/London (1991)

10. Stratonovich, R.L.: Conditional Markov Processes andTheir Applications to Optimal Control Theory. Elsevier,New York (1968)

11. Xiong, J.: An Introduction to Stochastic Filtering Theory.Oxford University Press, Oxford (2008)

Stochastic ODEs

Peter KloedenFB Mathematik, J.W. Goethe-Universitat,Frankfurt am Main, Germany

A scalar stochastic ordinary differential equation(SODE)

dXt D f .t; Xt / dt C g.t; Xt / dWt (1)

involves a Wiener process Wt , t � 0, which is oneof the most fundamental stochastic processes and isoften called a Brownian motion. A Wiener process is aGaussian process with W0 D 0 with probability 1 andnormally distributed incrementsWt �Ws for 0 � s < t

with

E .Wt �Ws/ D 0; E .Wt �Ws/2 D t � s;

where the incrementsWt2 �Wt1 andWt4 �Wt3 on non-overlapping intervals (i.e., with 0 � t1 < t2 � t3 < t4)are independent random variables. The sample paths ofa Wiener process are continuous, but they are nowheredifferentiable.

Consequently, an SODE is not a differential equa-tion at all, but just a symbolic representation for thestochastic integral equation

Xt D Xt0 CZ t

t0

f .s; Xs/ ds CZ t

t0

g.s; Xs/ dWs;

where the first integral is a deterministic Riemannintegral for each sample path. The second integralcannot be defined pathwise as a Riemann-Stieltjes in-tegral because the sample paths of the Wiener processdo not have even bounded variation on any boundedtime interval. Thus, a new type of stochastic integralis required. An Ito stochastic integral

R Tt0f .t/dWt is

defined as the mean-square limit of sums of products ofan integrand f evaluated at the left end point of eachpartition subinterval times Œtn; tnC1�, the increment ofthe Wiener process, i.e.,

Z T

t0

f .t/dWt WD m.s. � limN !1

N �1XjD0

f .tn/

�WtnC1

�Wtn

�;

where tnC1�tn D =N for n D 0, 1, : : :,N �1. Theintegrand function f may be random or even dependon the path of the Wiener process, but f .t/ shouldbe independent of future increments of the Wienerprocess, i.e., WtCh �Wt for h > 0.

The Ito stochastic integral has the important proper-ties (the second is called the Ito isometry) that

E

�Z T

t0

f .t/dWt

D 0; E

"�Z T

t0

f .t/dWt

�2#

DZ T

t0

E�f .t/2

�dt:

However, the solutions of Ito SODE satisfy a differentchain rule to that in deterministic calculus, called theIto formula, i.e.,

1400 Stochastic ODEs

U.t; Xt/ D U.t0; Xt0/CZ t

t0

L0U.s;Xs/ ds

CZ t

t0

L1.s; Xs/ dWs;

where

L0U D @U

@tC f

@U

@xC 1

2g2@2U

@x2; L1U D g

@U

@x:

An immediate consequence is that the integration rulesand tricks from deterministic calculus do not hold anddifferent expressions result, e.g.,

Z T

0

Ws dWs D 1

2W 2T � 1

2T:

The situation for vector-valued SODE and vector-valued Wiener processes is similar. Details can befound in Refs. [3, 4, 6].

Stratonovich SODEs

There is another stochastic integral called theStratonovich integral, for which the integrand functionis evaluated at the midpoint of each partitionsubinterval rather than at the left end point. It is writtenwith ıdWt to distinguish it from the Ito integral. AStratonovich SODE is thus written

dXt D f .t; Xt / dt C g.t; Xt / ı dWt :

Stratonovich stochastic calculus has the same chainrule as deterministic calculus, which means thatStratonovich SODE can be solved with the sameintegration tricks as for ordinary differential equations.However, Stratonovich stochastic integrals do notsatisfy the nice properties above for Ito stochasticintegrals, which are very convenient for estimates inproofs. Nor does the Stratonovich SODE have samedirect connection with diffusion process theory as theIto SODE, e.g., the coefficient of the Fokker-Planckequation correspond to those of the Ito SODE (1), i.e.,

@p

@tC f

@

@xC 1

2g2@2p

@x2D 0:

The Ito and Stratonovich stochastic calculi are bothmathematically correct. Which one should be used isreally a modeling issue, but once one has been chosen,the advantages of the other can be used through a mod-ification of the drift term to obtain the correspondingSODE of the other type that has the same solutions.

Numerical Solution of SODEs

The simplest numerical method for the aboveSODE (1) is the Euler-Maruyama scheme given by

YnC1 D Yn C f .tn; Yn/ n C g.tn; Yn/ Wn;

where n D tnC1� tn and Wn D WtnC1�Wtn . This is

intuitively consistent with the definition of the Ito inte-gral. Here Yn is a random variable, which is supposedto be an approximation on Xtn . The stochastic incre-ments Wn, which are N .0; n/ distributed, can begenerated using, for example, the Box-Muller method.In practice, however, only individual realizations canbe computed.

Depending on whether the realizations of the solu-tions or only their probability distributions are requiredto be close, one distinguishes between strong and weakconvergence of numerical schemes, respectively, ona given interval Œt0; T �. Let D maxn n be themaximum step size. Then a numerical scheme is saidto converge with strong order � if, for sufficientlysmall ,

E

ˇˇXT � Y . /NT

ˇˇ�

� KT �

and with weak order ˇ if

ˇˇE .p.XT // � E

p.Y

. /NT/�ˇˇ � Kp;T

ˇ

for each polynomial p. These are global discretizationerrors, and the largest possible values of � and ˇ

give the corresponding strong and weak orders, re-spectively, of the scheme for a whole class of stochas-tic differential equations, e.g., with sufficiently oftencontinuously differentiable coefficient functions. Forexample, the Euler-Maruyama scheme has strong order� D 1

2and weak order ˇ D 1, while the Milstein

scheme

Stochastic ODEs 1401

S

YnC1 D Yn C f .tn; Yn/ n C g.tn; Yn/ Wn

C1

2g.tn; Yn/

@g

@x.tn; Yn/

˚. Wn/

2 � n

�

has strong order � D 1 and weak order ˇ D 1; see[2, 3, 5].

Note that these convergence orders may be better forspecific SODE within the given class, e.g., the Euler-Maruyama scheme has strong order � D 1 for SODEwith additive noise, i.e., for which g does not dependon x, since it then coincides with the Milstein scheme.

The Milstein scheme is derived by expanding the in-tegrand of the stochastic integral with the Ito formula,the stochastic chain rule. The additional term involvesthe double stochastic integral

R tnC1

tn

R stndWu dWs,

which provides more information about the non-smooth Wiener process inside the discretizationsubinterval and is equal to 1

2

˚. Wn/

2 � n

�.

Numerical schemes of even higher order can beobtained in a similar way.

In general, different schemes are used for strongand weak convergence. The strong stochastic Taylorschemes have strong order � D 1

2, 1, 3

2, 2, : : :,

whereas weak stochastic Taylor schemes have weakorder ˇ D 1,2, 3, : : :. See [3] for more details. Inparticular, one should not use heuristic adaptations ofnumerical schemes for ordinary differential equationssuch as Runge-Kutta schemes, since these may notconverge to the right solution or even converge at all.

The proofs of convergence rates in the literatureassume that the coefficient functions in the abovestochastic Taylor schemes are uniformly bounded, i.e.,the partial derivatives of appropriately high order ofthe SODE coefficient functions f and g exist and areuniformly bounded. This assumption, however, is notsatisfied in many basic and important applications, forexample, with polynomial coefficients such as

dXt D �.1CXt/.1 �X2t / dt C .1 � X2

t / dWt

or with square-root coefficients such as in the Cox-Ingersoll-Ross volatility model

dVt D � .# � Vt / dt C �pVt dWt ;

which requires Vt � 0. The second is more difficultbecause there is a small probability that numericaliterations may become negative, and various ad hoc

methods have been suggested to prevent this. The paper[1] provides a systematic method to handle both ofthese problems by using pathwise convergence, i.e.,

supnD0;:::;NT

ˇXtn.!/ � Y . /n .!/

ˇ �! 0 as ! 0;

! 2 &:

It is quite natural to consider pathwise convergencesince numerical calculations are actually carried outpath by path. Moreover, the solutions of some SODEdo not have bounded moments, so pathwise conver-gence may be the only option.

Iterated Stochastic Integrals

Vector-valued SODE with vector-valued Wienerprocesses can be handled similarly. The main newdifficulty is how to simulate the multiple stochasticintegrals since these cannot be written as simpleformulas of the basic increments as in the doubleintegral above when they involve different Wienerprocesses. In general, such multiple stochastic integralscannot be avoided, so they must be approximatedsomehow. One possibility is to use random Fourierseries for Brownian bridge processes based on thegiven Wiener processes; see [3, 5].

Another way is to simulate the integrals themselvesby a simpler numerical scheme. For example, doubleintegral

I.2;1/;n DZ tnC1

tn

Z t

tn

dW 2s dW

1t

for two independent Wiener processes W 1t and W 2

t

can be approximated by applying the (vector-valued)Euler-Maruyama scheme to the 2-dimensional ItoSODE (with superscripts labeling components)

dX1t D X2

t dW1t ; dX2

t D dW 2t ; (2)

over the discretization subinterval Œtn; tnC1� with asuitable step size ı D .tnC1 � tn/=K . The solutionof the SODE (2) with the initial condition X1

tnD 0,

X2tn

D W 2tn

at time t D tnC1 is given by

X1tnC1

D I.2;1/;n; X2tnC1

D W 2n :

1402 Stochastic Simulation

Writing k D tn C kı and ıW j

n;k D WjkC1

� Wjk , the

stochastic Euler scheme for the SDE (2) reads

Y 1kC1 D Y 1k C Y 2k ıW1n;k; Y 2kC1 D Y 2k C ıW 2

n;k;

for 0 � k � K � 1; (3)

with the initial value Y 10 D 0, Y 20 D W 2tn

. The strongorder of convergence of � D 1

2of the Euler-Maruyama

scheme ensures that

E�ˇY 1K � I.2;1/;n

ˇ� � Cpı;

so I.2;1/;n can be approximated in the Milstein schemeby Y 1K with ı 2

n, i.e., K �1n , without affecting

the overall order of convergence.

Commutative Noise

Identities such as

Z tnC1

tn

Z t

tn

dW j1s dW

j2t C

Z tnC1

tn

Z t

tn

dW j2s dW

j1t

D W j1n W j2

n

allow one to avoid calculating the multiple integrals ifthe corresponding coefficients in the numerical schemeare identical, in this case if L1g2.t; x/ L2g1.t; x/

(where L2 is defined analogously to L1) for an SODEof the form

dXt D f .t; Xt / dtCg1.t; Xt/ dW1t Cg2.t; Xt / dW

2t :

(4)Then the SODE (4) is said to have commutative noise.

Concluding Remarks

The need to approximate multiple stochastic integralsplaces a practical restriction on the order of strongschemes that can be implemented for a general SODE.Wherever possible special structural properties likecommutative noise of the SODE under investigationshould be exploited to simplify strong schemes asmuch as possible. For weak schemes the situationis easier as the multiple integrals do not need to beapproximated so accurately. Moreover, extrapolationof weak schemes is possible.

The important thing is to decide first what kind ofapproximation one wants, strong or weak, as this willdetermine the type of scheme that should be used, andthen to exploit the structural properties of the SODEunder consideration to simplify the scheme that hasbeen chosen to be implemented.

References

1. Jentzen, A., Kloeden, P.E., Neuenkirch, A.: Convergenceof numerical approximations of stochastic differential equa-tions on domains: higher order convergence rates withoutglobal Lipschitz coefficients. Numer. Math. 112, 41–64(2009)

2. Kloeden, P.E.: The systematic deviation of higher order nu-merical methods for stochastic differential equations. MilanJ. Math. 70, 187–207 (2002)

3. Kloeden, P.E., Platen, E.: The Numerical Solutionof Stochastic Differential Equations, 3rd rev. printing.Springer, Berlin, (1999)

4. Mao, X.: Stochastic Differential Equations and Applica-tions, 2nd edn. Horwood, Chichester (2008)

5. Milstein, G.N.: Numerical Integration of Stochastic Differ-ential Equations. Kluwer, Dordrecht (1995)

6. Øksendal, B.: Stochastic Differential Equations. An Intro-duction with Applications, 6th edn. 2003, Corr. 4th printing.Springer, Berlin (2007)

Stochastic Simulation

Dieter W. HeermannInstitute for Theoretical Physics, HeidelbergUniversity, Heidelberg, Germany

Synonyms

Brownian Dynamics Simulation; Langevin Simulation;Monte Carlo Simulations

Modelling a system or data one is often faced with thefollowing:• Exact data is unavailable or expensive to obtain• Data is uncertain and/or specified by a probability

distributionor decisions have to be made with respect to the de-grees of freedom that are taken explicitly into account.This can be seen by looking at a system with twocomponents. One of the components could be water

Stochastic Simulation 1403

S

molecules and the other component large molecules.The decision is to take the water molecules explicitlyinto account or to treat them implicitly. Since the watermolecules move much faster than the large molecules,we can eliminate the water by subsuming their actionon the larger molecules by a stochastic force, i.e., asa random variable with a specific distribution. Thuswe have eliminated some details in favor of a prob-abilistic description where perhaps some elements ofthe model description are given by deterministic rulesand other contributes stochastically. Overall a modelderived along the outlined path can be viewed as if anindividual state has a probability that may depend onmodel parameters.

In most of the interesting cases, the number ofavailable states the model has will be so large thatthey simply cannot be enumerated. A sampling of thestates is necessary such that the most relevant stateswill be sampled with the correct probability. Assumethat the model has some deterministic part. In the aboveexample, the motion of larger molecules is governedby Newton’s equation of motion. These are augmentedby stochastic forces mimicking the water molecules.Depending on how exactly this is implemented resultsin Langevin equations

m Rx D �rU.x/ � �m Px C .t/p2�kBTm; (1)

where x denotes the state (here the position), U thepotential, m the mass of the large molecule, kB theBoltzmann constant, T the temperature, and thestochastic force with the properties:

h.t/i D 0 (2)˝.t/.t 0/

˛ D ı.t � t 0/: (3)

Neglecting the acceleration in the Langevin equa-tion yields the Brownian dynamics equation of motion

Px.t/ D �rU.x/=� C .t/p2D (4)

with � D �m and D D kBT=�.Hence the sampling is obtained using the equations

of motion to transition from one state to the next. Ifenough of the available states (here x) are sampled,quantities of interest that depend on the states can becalculated as averages over the generated states:

NA DXx

A.x/P.x; ˛/; (5)

where P.x; ˛/ is the probability of the state and ˛ a setof parameters (e.g., the temperature T , mass m, etc.).

A point of view that can be taken is that whatthe Eqs. (1) and (4) accomplish is the generation ofa stochastic trajectory through the available states.This can equally be well established by other means.As long as we satisfy the condition that the rightprobability distribution is generated, we could generatethe trajectory by a Monte Carlo method [1].

In a Monte Carlo formulation, a transition probabil-ity from a state x to another state x0 is specified

W.x0jx/: (6)

Together with the proposition probability for thestate x0, a decision is made to accept or reject thestate (Metropolis-Hastings Monte Carlo Method). Anadvantage of this formulation is that it allows freedomin the choice of proposition of states and the efficientsampling of the states (importance sampling). In moregeneral terms, what one does is to set up a biased ran-dom walk that explores the target distribution (MarkovChain Monte Carlo).

A special case of the sampling that yields a Markovchain is the Gibbs sampler. Assume x D .x1; x2/ withtarget P.x; ˛/

Algorithm 1 Gibbs Sampler Algorithm:

1: initialize x0 D .x10 ; x20/

2: while i � max number of samples do3: sample x1i P.x1jx2i�1; ˛/

4: sample x2i P.x2jx1i ; ˛/5: end while

then fx1; x2g is a Markov chain. Thus we obtain asequence of states such that we can again apply (5) tocompute the quantities of interest.

Common to all of the above stochastic simulationmethods is the use of random numbers. They are eitherused for the implementation of the random contributionto the force or the decision whether a state is acceptedor rejected. Thus the quality of the result depends onthe quality of the random number generator.

1404 Stochastic Systems

Reference

1. Binder, K., Heermann, D.W.: Monte Carlo Simulation inStatistical Physics: An Introduction. Graduate Texts inPhysics, 5th edn. Springer, Heidelberg (2010)

Stochastic Systems

Guang Lin1;2;3 and George Em Karniadakis41Fundamental and Computational SciencesDirectorate, Pacific Northwest National Laboratory,Richland, WA, USA2School of Mechanical Engineering, PurdueUniversity, West Lafayette, IN, USA3Department of Mathematics, Purdue University, WestLafayette, IN, USA4Division of Applied Mathematics, Brown University,Providence, RI, USA


93E03

Synonyms

Noisy Systems; Random Systems; Stochastic Systems

Short Definition

A stochastic system may contain one or more elementsof random, i.e., nondeterministic behavior. Comparedto a deterministic system, a stochastic system does notalways generate the same output for a given input. Theelements of systems that can be stochastic in naturemay include noisy initial conditions, random boundaryconditions, random forcing, etc.

Description

Stochastic systems (SS) are encountered in many appli-cation domains in science, engineering, and business.They include logistics, transportation, communicationnetworks, financial markets, supply chains, social sys-tems, robust engineering design, statistical physics,

systems biology, etc. Stochasticity or randomness isperhaps associated with a bad outcome, but harnessingstochasticity has been pursued in arts to create beauty,e.g., in the paintings of Jackson Pollock or in the musicof Iannis Xenakis. Similarly, it can be exploited inscience and engineering to design new devices (e.g.,stochastic resonances in the Bang and Olufsen speak-ers) or to design robust and cost-effective productsunder the framework of uncertainty-based design andreal options [17].

Stochasticity is often associated with uncertainty,either intrinsic or extrinsic, and specifically with thelack of knowledge of the properties of the system;hence quantifying uncertain outcomes and systemresponses is of great interest in applied probability andscientific computing. This uncertainty can be furtherclassified as aleatory, i.e., statistical, and epistemicwhich can be reduced by further measurements orcomputations of higher resolution. Mathematically,stochasticity can be described by either deterministicor stochastic differential equations. For example, themolecular dynamics of a simple fluid is described bythe classical deterministic Newton’s law of motion orby the deterministic Navier-Stokes equations whoseoutputs, in both cases, however may be stochastic. Onthe other hand, stochastic elliptic equations can beused to predict the random diffusion of water in porousmedia, and similarly a stochastic differential equationmay be used to model neuronal activity in the brain[9, 10].

Here we will consider systems that are governedby stochastic ordinary and partial differential equations(SODEs and SPDEs), and we will present some ef-fective methods for obtaining stochastic solutions inthe next section. In the classical stochastic analysis,these terms refer to differential equations subject towhite noise either additive or multiplicative, but inmore recent years, the same terminology has beenadopted for differential equations with color noise, i.e.,processes that are correlated in space or time. Thecolor of noise, which can also be pink or violet, maydictate the numerical method used to predict efficientlythe response of a stochastic system, and hence it isimportant to consider this carefully at the modelingstage. Specifically, the correlation length or time scaleis the most important parameter of a stochastic processas it determines the effective dimension of the process;the smaller the correlation scale, the larger the dimen-sionality of the stochastic system.

Stochastic Systems 1405

S

Example: To be more specific in the following,we present a tumor cell growth model that involvesstochastic inputs that need to be represented accordingto the correlation structure of available empirical data[27]. The evolution equation is

Px.t I!/ DG.x/C g.x/f1.t I!/C f2.t I!/;x.0I!/ Dx0.!/; (1)

where x.t I!/ denotes the concentration of tumor cellat time t ,

G.x/ D x.1 � �x/ � ˇ x

x C 1; g.x/ D � x

x C 1;

ˇ is the immune rate, and � is related to the rate ofgrowth of cytotoxic cells. The random process f1.t I!/represents the strength of the treatment (i.e., the dosageof the medicine in chemotherapy or the intensity ofthe ray in radiotherapy), while the process f2.t I!/ isrelated to other factors, such as drugs and radiotherapy,that restrain the number of tumor cells. The parametersˇ, � and the covariance structure of random processesf1 and f2 are usually estimated based on empiricaldata.

If the processes f1.t; !/ and f2.t; !/ are indepen-dent, they can be represented using the Karhunen-Loeve (K-L) expansion by a zero mean, second-orderrandom process f .t; !/ defined on a probability space.&;F;P/ and indexed over t 2 Œa; b�. Let us denote thecontinuous covariance function of f .t I!/ as C.s; t/.Then the process f .t I!/ can be represented as

f .t I!/ DNdXkD1

p�kek.t/k.!/;

where k.!/ are uncorrelated random variables withzero mean and unitary variance, while �k and ek.t/ are,respectively, eigenvalues and (normalized) eigenfunc-tions of the integral operator with kernel C.s; t/, i.e.,

Z b

a

C.s; t/ek.s/ds D �kek.t/:

The dimensionNd depends strongly on the correlationscale of the kernel C.s; t/. If we rearrange the eigen-values �k in a descending order, then any truncationof the expansion f .t I!/ is optimal in the sense that it

minimizes the mean square error [3, 18, 20]. The K-Lexpansion has been employed to represent randominput processes in many stochastic simulations (see,e.g., [7, 20]).

Stochastic Modeling and ComputationalMethods

We present two examples of two fundamentally differ-ent descriptions of SS in order to show some of thecomplexity but also rich stochastic response that canbe obtained: the first is based on a discrete particlemodel and is governed by SODEs, and the second isa continuum system and is governed by SPDEs.

A Stochastic Particle System: We first describe astochastic model for a mesoscopic system governedby a modified version of Newton’s law of motion, theso-called dissipative particle dynamics (DPD) equa-tions [19]. It consists of particles which correspondto coarse-grained entities, thus representing molecularclusters rather than individual atoms. The particlesmove off-lattice interacting with each other througha set of prescribed (conservative and stochastic) andvelocity-dependent forces. Specifically, there are threetypes of forces acting on each dissipative particle: (a)a purely repulsive conservative force, (b) a dissipativeforce that reduces velocity differences between theparticles, and (c) a stochastic force directed along theline connecting the center of the particles. The lasttwo forces effectively implement a thermostat so thatthermal equilibrium is achieved. Correspondingly, theamplitude of these forces is dictated by the fluctuation-dissipation theorem that ensures that in thermodynamicequilibrium the system will have a canonical distri-bution. All three forces are modulated by a weightfunction which specifies the range of interaction orcutoff radius rc between the particles and renders theinteraction local.

The DPD equations for a system consisting of Nparticles have equal mass (for simplicity in the pre-sentation) m, position ri , and velocities ui , which arestochastic in nature. The aforementioned three types offorces exerted on a particle i by particle j are given by

Fcij D F .c/.rij /eij ; Fdij D ��!d .rij /.vij � eij /eij ;

Frij D �!r .rij /ij eij ;


0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

distance (r)

V(r

)/k B

T

VLJ

averaged

Stochastic Systems, Fig. 1 Left: Lennard-Jones potential and its averaged soft repulsive-only potential. Right: Polymer chainsflowing in a sea of solvent in DPD. For more details see [19]

where rij D ri � rj , vij D vi � vj , rij D jrij jand the unit vector eij D rij

rij. The variables � and �

determine the strength of the dissipative and randomforces, respectively, ij are symmetric Gaussian ran-dom variables with zero mean and unit variance, and!d and !r are weight functions.

The time evolution of DPD particles is described byNewton’s law

dri D vi ıt I dvi D Fci ıt C Fdi ıt C Fripıt

mi

;

where Fci D Pi¤j Fcij is the total conservative force

acting on particle i ; Fdi and Fri are defined similarly.The velocity increment due to the random force hasa factor

pıt since it represents Brownian motion,

which is described by a standard Wiener process with

a covariance kernel given by CFF .ti ; tj / D e� jt1�t2 j

A ,where A is the correlation time for this stochasticprocess. The conservative force Fc is typically givenin terms of a soft potential in contrast to the Lennard-Jones potential used in molecular dynamics studies(see Fig. 1(left)). The dissipative and random forces arecharacterized by strengths !d .rij / and !r.rij / coupleddue to the fluctuation-dissipation theorem.

Several complex fluid systems in industrial andbiological applications (DNA chains, polymer gels,

lubrication) involve multiscale processes and can bemodeled using modifications of the above stochasticDPD equations [19]. Dilute polymer solutions are atypical example, since individual polymer chains forma group of large molecules by atomic standards butstill governed by forces similar to intermolecular ones.Therefore, they form large repeated units exhibitingslow dynamics with possible nonlinear interactions(see Fig. 1(right)).

A Stochastic Continuum System: Here we presentan example from classical aerodynamics on shockdynamics by reformulating the one-dimensional pistonproblem within the stochastic framework, i.e., we allowfor random piston motions which may be changing intime [11]. In particular, we superimpose small randomvelocity fluctuations to the piston velocity and aim toobtain both analytical and numerical solutions of thestochastic flow response. We consider a piston havinga constant velocity, Up , moving into a straight tubefilled with a homogeneous gas at rest. A shock wavewill be generated ahead of the piston. A sketch of thepiston-driven shock tube with random piston motionsuperimposed is shown in Fig. 2(left).

As shown in Fig. 2 (left), Up and S are the deter-ministic speed of the piston and deterministic speed ofthe shock, respectively, and �0, P0, C0, �1, P1, and C1are the deterministic density, pressure, and local sound

Stochastic Systems 1407

S

Up + vp ( t )

S + vs ( t )

U = 0P = P0ρ = ρ0C = C0

U = Up + vp ( t )P = P1ρ = ρ1C = C1

τ

<ξ2 (

t)>

/(U

pqS

’A/α

)2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0

5

10

15

20Perturbation AnalysisLater time: ∼τEarly time: ∼τ2

Stochastic Systems, Fig. 2 Left: Sketch of piston-driven shocktube with random piston motion. Right: Normalized variance ofperturbed shock paths. Solid line: perturbation analysis results.

Dashed line: early-time asymptotic results, h2./i 2. Dash-dotted line: late-time asymptotic results, h2./i

speed ahead and after of the shock, respectively. Wenow define the stochastic motion of the piston by super-imposing a small stochastic component to the steadyspeed of the piston, i.e., up.t/ D UpŒ1 C �V .t; !/�,where � is the amplitude of the random perturbation.Here V.t; !/ is modeled as a random process with zero

mean and covariance hV.t1; !/; V .t2; !/i D e� jt1�t2 j

A ,where A is the correlation time; it can be representedby a truncated K-L expansion as explained earlier. Ourobjective is to quantify the deviation of the perturbedshock paths due to the random piston motion fromthe unperturbed ones, which are given by X.t/ DS � t . If the amplitude � is small, 0 < � � 1, theanalytical solutions for the perturbed shock paths canbe expressed as follows:

h2./i D.UpqS 0A=˛/2"2

1XnD1

n�1XmD0

.�r/nCmIn;m./C

1XnD0

r2nIn;n./

#(2)

where D ˛t=A, and

In;m./ D 2

ˇmC 1

ˇnCmhe�ˇm C e�ˇn

�1 � e�.ˇm�ˇn/i ;

where S 0 D dSdUp

, m < n, ˛ D C1CUp�SC1

, ˇ DC1CUp�SC1CS�Up , q D 2

1Ck and r D 1�k1Ck . Here k D C

SCS 0Up1C�SUp

and � D cp=cv is the ratio of specific heats.In Fig. 2 (right), the variance of the perturbed shock

path, h2./i=.UpqS 0A=˛/2, is plotted as a functionof with Up D 1:25, i.e., corresponding to Machnumber of the shockM D 2. The asymptotic formulasfor small and large are also included in the plot.In Fig. 2 (right), we observe that the variance of theshock location grows quadratically with time forearly times and switches to linear growth for longertimes.

The stochastic solutions for shock paths, for eithersmall or large piston motions, can also be obtainednumerically by solving the full nonlinear Euler equa-tions with an unsteady stochastic boundary, namely,the piston position to model the stochastic piston prob-lem. Classic Monte Carlo simulations [4] or quasi-Monte Carlo simulations [2] can be performed forthese stochastic simulations. However, due to the slowconvergence rate of Monte Carlo methods, it may takethousands of equivalent deterministic simulations toachieve acceptable accuracy. Recently, methods basedon generalized polynomial chaos (gPC) expansionshave become popular for such SPDEs due to their fastconvergence rate for SS with color noise. The termpolynomial chaos was first coined by Norbert Wienerin 1938 in his pioneering work on representing Gaus-sian stochastic processes [22] as generalized Fourierseries. In Wiener’s work, Hermite polynomials serve


as an orthogonal basis. The gPC method for solvingSPDEs is an extension of the polynomial chaos methoddeveloped in [7], inspired by the theory of Wiener-Hermite polynomial chaos. The use of Hermite poly-nomials may not be optimum in applications involvingnon-Gaussian processes, and hence gPC was proposedin [25] to alleviate the difficulty. In gPC, differentkinds of orthogonal polynomials are chosen as a basisdepending on the probability distribution of the randominputs. The P th-order, gPC approximations of thesolution u.x; / can be obtained by projecting u ontothe space W P

N , i.e.,

PPN u D uPN .x; / D

MXmD1

Oum.x/�m./; (3)

where PPN u denotes the orthogonal projection operator

from L2�./ onto W PN , M C 1 D .NCP/Š

N ŠP Š, and Oum are

the coefficients, and � the probability measure.Although gPC was shown to exhibit exponential

convergence in approximating stochastic solutionsat finite times, gPC may converge slowly or failto converge even in short-time integration due to adiscontinuity of the approximated solution in randomspace. To this end, the Wiener-Haar method [14, 15]based on wavelets, random domain decomposi-tion [12], multielement-gPC (ME-gPC) [21], andmultielement probabilistic collocation method (ME-PCM) [5] were developed to address problems relatedto the aforementioned discontinuities in randomspace. Additionally, a more realistic representationof stochastic inputs associated with various sources ofuncertainty in the stochastic systems may lead to high-dimensional representations, and hence exponentialcomputational complexity, running into the so-called curse of dimensionality. Sparse grid stochasticcollocation method [24] and various versions ANOVA(ANalysis Of VAriance) method [1, 6, 8, 13, 23, 26]have been employed as effective dimension-reductiontechniques for quantifying the uncertainty in stochasticsystems with dimensions up to 100.

Conclusion

Aristotle’s logic has ruled our scientific thinking in thepast two millennia. Most scientific models and theorieshave been constructed from exact models and logic

reasoning. It is argued in [16] that SS models andstatistical reasoning are more relevant “i) to the world,ii) to science and many parts of mathematics and iii)particularly to understanding the computations in ourown minds, than exact models and logical reasoning.”Indeed, many real-world problems can be viewed ormodeled as SS with great potential benefits acrossdisciplines from physical sciences and engineeringto social sciences. Stochastic modeling can bring inmore realism and flexibility and account for uncertaininputs and parametric uncertainty albeit at the ex-pense of mathematical and computational complexity.However, the rapid mathematical and algorithmic ad-vances already realized at the beginning of the twenty-first century along with the simultaneous advancesin computer speeds and capacity will help alleviatesuch difficulties and will make stochastic modeling thestandard norm rather than the exception in the yearsahead. The three examples we presented in this chapterillustrate the diverse applications of stochastic mod-eling in biomedicine, materials processing, and fluidmechanics. The same methods or proper extensionscan also be applied to quantifying uncertainties in cli-mate modeling; in decision making under uncertainty,e.g., in robust engineering design and in financial mar-kets; but also for modeling the plethora of emergingsocial networks. Further work on the mathematicaland algorithmic formulations of such more complexand high-dimensional systems is required as currentapproaches cannot yet deal satisfactorily with whitenoise, system discontinuities, high dimensions, andlong-time integration.

Acknowledgements The authors would like to acknowledgesupport by DOE/PNNL Collaboratory on Mathematics forMesoscopic Modeling of Materials (CM4).

References

1. Bieri, M., Schwab, C.: Sparse high order fem for el-liptic sPDEs. Comput. Methods Appl. Mech. Eng. 198,1149–1170 (2009)

2. Caflisch, R.: Monte Carlo and quais-Monte Carlo methods.Acta Numer. 7, 1–49 (1998)

3. Chien, Y., Fu, K.S.: On the generalized Karhunen-Loeveexpansion. IEEE Trans. Inf. Theory 13, 518–520 (1967)

4. Fishman, G.: Monte Carlo: Concepts, Algorithms, andApplications. Springer Series in Operations Research andFinancial Engineering. Springer, New York (2003)

5. Foo, J., Wan, X., Karniadakis, G.E.: A multi-element prob-abilistic collocation method for PDEs with parametric un-

Stokes or Navier-Stokes Flows 1409

S

certainty: error analysis and applications. J. Comput. Phys.227, 9572–9595 (2008)

6. Foo, J.Y., Karniadakis, G.E.: Multi-element probabilisticcollocation in high dimensions. J. Comput. Phys. 229,1536–1557 (2009)

7. Ghanem, R.G., Spanos, P.: Stochastic Finite Eelements: ASpectral Approach. Springer, New York (1991)

8. Griebel, M.: Sparse grids and related approximationschemes for higher-dimensional problems. In: Proceedingsof the Conference on Foundations of Computational Math-ematics, Santander (2005)

9. van Kampen, N.: Stochastic Processes in Physics and Chem-istry, 3rd edn. Elsevier, Amsterdam (2008)

10. Laing, C., Lord, G.J.: Stochastic Methods in Neuroscience.Oxford University Press, Oxford (2009)

11. Lin, G., Su, C.H., Karniadakis, G.E.: The stochastic pis-ton problem. Proc. Natl. Acad. Sci. U. S. A. 101(45),15,840–15,845 (2004)

12. Lin, G., Tartakovsky, A.M., Tartakovsky, D.M.: Uncertaintyquantification via random domain decomposition and prob-abilistic collocation on sparse grids. J. Comput. Phys. 229,6995–7012 (2010)

13. Ma, X., Zabaras, N.: An adaptive hierarchical sparse gridcollocation method for the solution of stochastic differentialequations. J. Comput. Phys. 228, 3084–3113 (2009)

14. Maitre, O.P.L., Njam, H.N., Ghanem, R.G., Knio, O.M.:Multi-resolution analysis of Wiener-type uncertainty prop-agation schemes. J. Comput. Phys. 197, 502–531 (2004)

15. Maitre, O.P.L., Njam, H.N., Ghanem, R.G., Knio, O.M.:Uncertainty propagation using Wiener-Haar expansions. J.Comput. Phys. 197, 28–57 (2004)

16. Mumford, D.: The dawning of the age of stochasticity.In: Mathematics Towards the Third Millennium, Rome(1999)

17. de Neufville, R.: Uncertainty Management for EngineeringSystems Planning and Design. MIT, Cambridge (2004)

18. Papoulis, A.: Probability, Random Variables and StochasticProcesses, 3rd edn. McGraw-Hill, Europe (1991)

19. Symeonidis, V., Karniadakis, G., Caswell, McGraw-HillEurope B.: A seamless approach to multiscale complex fluidsimulation. Comput. Sci. Eng. 7, 39–46 (2005)

20. Venturi, D.: On proper orthogonal decomposition of ran-domly perturbed fields with applications to flow past acylinder and natural convection over a horizontal plate. J.Fluid Mech. 559, 215–254 (2006)

21. Wan, X., Karniadakis, G.E.: An adaptive multi-element gen-eralized polynomial chaos method for stochastic differentialequations. J. Comput. Phys. 209, 617–642 (2005)

22. Wiener, N.: The homogeneous chaos. Am. J. Math. 60,897–936 (1938)

23. Winter, C., Guadagnini, A., Nychka, D., Tartakovsky,D.: Multivariate sensitivity analysis of saturatedflow through simulated highly heterogeneousgroundwater aquifers. J. Comput. Phys. 217, 166–175(2009)

24. Xiu, D., Hesthaven, J.: High order collocation methods fordifferential equations with random inputs. SIAM J. Sci.Comput. 27(3), 1118–1139 (2005)

25. Xiu, D., Karniadakis, G.E.: The Wiener-Askey polynomialchaos for stochastic differential equations. SIAM J. Sci.Comput. 24(2), 619–644 (2002)

26. Yang, X., Choi, M., Lin, G., Karniadakis, G.E.: Adap-tive anova decomposition of stochastic incompressible andcompressible flows. J. Comput. Phys. 231, 1587–1614(2012)

27. Zeng, C., Wang, H.: Colored noise enhanced stability in atumor cell growth system under immune response. J. Stat.Phys. 141, 889–908 (2010)

Stokes or Navier-Stokes Flows

Vivette Girault and Frederic HechtLaboratoire Jacques-Louis Lions, UPMC Universityof Paris 06 and CNRS, Paris, France


76D05; 76D07; 35Q30; 65N30; 65F10

Definition Terms/Glossary

Boundary layer It refers to the layer of fluid in theimmediate vicinity of a bounding surface where theeffects of viscosity are significant.

GMRES Abbreviation for the generalized minimalresidual algorithm. It refers to an iterative methodfor the numerical solution of a nonsymmetric sys-tem of linear equations.

Precondition It consists in multiplying both sides ofa system of linear equations by a suitable matrix,called the preconditioner, so as to reduce the condi-tion number of the system.

The Incompressible Navier-Stokes Model

The incompressible Navier-Stokes system of equationsis a widely accepted model for viscous Newtonianincompressible flows. It is extensively used in me-teorology, oceanography, canal flows, pipeline flows,automotive industry, high-speed trains, wind turbines,etc. Computing accurately its solutions is a difficult andimportant challenge.

A Newtonian fluid is a model whose Cauchy stresstensor depends linearly on the strain tensor, in contrastto non-Newtonian fluids for which this relation is

1410 Stokes or Navier-Stokes Flows

nonlinear and possibly implicit. For a Navier-Stokesfluid model, the constitutive equation defining theCauchy stress tensor T is:

T D ��I C 2�D.u/; (1)

where � > 0 is the constant viscosity coefficient, rep-resenting friction between molecules,� is the pressure,u is the velocity, D.u/ D 1

2

�r u C r uT�

is the straintensor, and .r u/ij D @ui

@xjis the gradient tensor. When

substituted into the balance of linear momentum

�dudt

D div T C �f ; (2)

where � > 0 is the fluid’s density, f is an externalbody force (e.g., gravity), and du

dtis the material time

derivative

dudt

D @u@t

Cu�r u; where u�r uD Œr u�uDXi

ui@u@xi

;

(3)(1) gives, after division by �,

@u@t

C u � r u D �1�

r � C 2�

�div D.u/C f :

But the density � is constant, since the fluid is incom-pressible. Therefore renaming the quantities p D �

�

and the kinematic viscosity � D �

�, the momentum

equation reads:

@u@t

C u � r u D �r p C 2� div D.u/C f : (4)

As the fluid is incompressible, the conservation of mass

@�

@tC div.�u/ D 0;

reduces to the incompressibility condition

div u D 0: (5)

From now on, we assume that ˝ is a bounded, con-nected, open set in R

3, with a suitably piecewisesmooth boundary @˝ (essentially, without cusps ormultiple points). The relations (4) and (5) are theincompressible Navier-Stokes equations in ˝ . They

are complemented with boundary conditions, such asthe no-slip condition

u D 0; on @˝; (6)

and an initial condition

u.�; 0/ D u0.�/ in ˝;

satisfying divu0 D 0; and u0 D 0; on @˝: (7)

In practical situations, other boundary conditions maybe prescribed. One of the most important occurs inflows past a moving obstacle, in which case (6) isreplaced by

u D g; on @˝ whereZ@˝

g � n D 0; (8)

where g is the velocity of the moving body and n

denotes the unit exterior normal vector to @˝ . Tosimplify, we shall only discuss (6), but we shall presentnumerical experiments where (8) is prescribed.

If the boundary conditions do not involve the bound-ary traction vector T n, (4) can be substantially simpli-fied by using the fluid’s incompressibility. Indeed, (5)implies 2div D.u/ D � u, and (4) becomes

@u@t

C u � r u � �� u C r p D f : (9)

When written in dimensionless variables (denoted bythe same symbols), (9) reads

@u@t

C u � r u � 1

Re� u C r p D f ; (10)

where Re is the Reynolds number, ReD LU�

, L is acharacteristic length, and U a characteristic velocity.When 1 � Re � 105, the flow is said to be laminar.Finally, when the Reynolds number is small and theforce f does not depend on time, the material timederivative can be neglected. Then reverting to theoriginal variables, this yields the Stokes system:

� �� u C r p D f ; (11)

complemented with (5) and (6).


S

Some Theoretical Results

Let us start with Stokes problems (11), (5), and (6). Inview of both theory and numerics, it is useful to writeit in variational form. Let

H1.˝/ D fv 2 L2.˝/ I r v 2 L2.˝/3g ;H10 .˝/ D fv 2 H1.˝/ I v D 0 on @˝g;

L2ı.˝/ D fv 2 L2.˝/ I .v; 1/ D 0g;where .�; �/ denotes the scalar product of L2.˝/:

.f; g/ DZ˝

f g:

Let H�1.˝/ denote the dual space of H10 .˝/ and

h�; �i the duality pairing between them. The space H10

takes into account the no-slip boundary condition onthe velocity, and the space L2ı is introduced to liftthe undetermined constant in the pressure; note thatit is only defined by its gradient and hence up to oneadditive constant in a connected region. Assume thatf belongs to H�1.˝/3. For our purpose, a suitablevariational form is: Find a pair .u; p/ 2 H1

0 .˝/3 �

L2ı.˝/ solution of

8.v; q/ 2H10 .˝/

3 � L2ı.˝/ ;�.r u;r v/ � .p; div v/�.q; div u/Dhf ; vi:

(12)

Albeit linear, this problem is difficult both from theo-retical and numerical standpoints. The pressure can beeliminated from (12) by working with the space V ofdivergence-free velocities:

V D fv 2 H10 .˝/

3 I div v D 0g;

but the difficulty lies in recovering the pressure. Ex-istence and continuity of the pressure stem from thefollowing deep result: The divergence operator is anisomorphism from V ? onto L2ı.˝/, where V ? is theorthogonal of V in H1

0 .˝/3. In other words, for every

q 2 L2ı.˝/, there exists one and only one v 2 V ? so-lution of div v D q. Moreover v depends continuouslyon q:

kr vkL2.˝/ � 1

ˇkqkL2.˝/; (13)

where ˇ > 0, only depends on ˝ . This inequality isequivalent to the following “inf-sup condition”:

infq2L2ı.˝/

supv2H1

0 .˝/3

.div v; q/

kr vkL2.˝/kqkL2.˝/� ˇ: (14)

Interestingly, (14) is not true when @˝ has an outwardcusp, a situation that occurs, for instance, in a flowexterior to two colliding spheres. There is no simpleproof of (14). Its difficulty lies in the no-slip boundarycondition prescribed on v: The proof is much simplerwhen it is replaced by the weaker condition v � n D 0.The above isomorphism easily leads to the followingresult:

Theorem 1 For any f in H�1.˝/3 and any � > 0,Problem (12) has exactly one solution and this solutiondepends continuously on the data:

kr ukL2.˝/� 1

�kf kH�1.˝/; kpkL2.˝/� 1

ˇkf kH�1.˝/:

(15)

Next we consider the steady Navier-Stokes system.The natural extension of (12) is: Find a pair .u; p/ 2H10 .˝/

3 � L2ı.˝/ solution of

8.v; q/ 2 H10 .˝/

3 � L2ı.˝/ ;�.r u;r v/C .u � r u; v/

� .p; div v/ � .q; div u/ D hf ; vi: (16)

Its analysis is fairly simple because on one hand thenonlinear convection term u � r u has the followingantisymmetry:

8u 2 V;8v 2 H1.˝/3 ; .u � r u; v/ D �.u � r v;u/;(17)

and on the other hand, it belongs to L3=2.˝/3, which,roughly speaking, is significantly smoother than thedata f inH�1.˝/3. This enables to prove existence ofsolutions, but uniqueness is only guaranteed for smallforce or large viscosity. More precisely, let

N D supw;u;v2V;w;u;v¤0

.w � r u; v/kr wkL2.˝/kr ukL2.˝/kr vkL2.˝/

:

Then we have the next result.


Theorem 2 For any f in H�1.˝/3 and any � > 0,Problem (16) has at least one solution. A sufficientcondition for uniqueness is

N�2

kf kH�1.˝/ < 1: (18)

Now we turn to the time-dependent Navier-Stokessystem. Its analysis is much more complex becausein R

3 the dependence of the pressure on time holdsin a weaker space. To simplify, we do not treat themost general situation. For a given time interval Œ0; T �,Banach space X , and number r � 1, the relevantspaces are of the form Lr.0; T IX/, which is the spaceof functions defined and measurable in �0; T Œ, such that

Z T

0

kvkrXdt < 1;

and

W 1;r .0; T IX/Dfv2Lr.0; T IX/ Idvdt

2Lr.0; T IX/g;

W1;r0 .0; T IX/Dfv 2W 1;r .0; T IX/ Iv.0/Dv.T /D0g;

with dual space W �1;r 0

.0; T IX/, 1r

C 1r 0 D 1. There

are several weak formulations expressing (4)–(7). Fornumerical purposes, we shall use the following one:Find u 2 L2.0; T IV / \ L1.0; T IL2.˝/3/, withdudt

in L3=2.0; T IV 0/, and p 2 W �1;1.0; T IL2ı.˝//satisfying a.e. in �0; T Œ

8.v; q/ 2 H10 .˝/

3 � L2ı.˝/;d

dt.u.t/; v/C�.r u.t/;r v/C.u.t/ � r u.t/; v/

� .p.t/; div v/ � .q; div u.t// D hf .t/; vi;(19)

with the initial condition (7). This problem always hasat least one solution.

Theorem 3 For any f in L2.0; T IH�1.˝/3/, any� > 0, and any initial data u0 2 V , Problem (19), (7)has at least one solution.

Unfortunately, unconditional uniqueness (which istrue in R

2) is to this date an open problem in R3. In

fact, it is one of the Millennium Prize Problems.

Discretization

Solving numerically a steady Stokes system is costlybecause the theoretical difficulty brought by the pres-sure is inherited both by its discretization, whateverthe scheme, and by the computer implementation ofits scheme. This computational difficulty is aggra-vated by the need of fine meshes for capturing com-plex flows produced by the Navier-Stokes system. Incomparison, when the flow is laminar, at reasonableReynolds numbers, the additional cost of the nonlinearconvection term is minor. There are some satisfac-tory schemes and algorithms but so far no “miracle”method.

Three important methods are used for discretiz-ing flow problems: Finite-element, finite-difference, orfinite-volume methods. For the sake of simplicity, weshall mainly consider discretization by finite-elementmethods. Usually, they consist in using polynomialfunctions on cells: triangles or quadrilaterals in R

2 ortetrahedra or hexahedra in R

3. Most finite-differenceschemes can be derived from finite-element methodson rectangles in R

2 or rectangular boxes in R3, coupled

with quadrature formulas, in which case the meshmay not fit the boundary and a particular treatmentmay be required near the boundary. Finite volumesare closely related to finite differences but are morecomplex because they can be defined on very generalcells and do not involve functions. All three methodsrequire meshing of the domain, and the success ofthese methods depends not only on their accuracy butalso on how well the mesh is adapted to the problemunder consideration. For example, boundary layersmay appear at large Reynolds numbers and requirelocally refined meshes. Constructing a “good” meshcan be difficult and costly, but these important meshingissues are outside the scope of this work. Last, but notleast, in many practical applications where the Stokessystem is coupled with other equations, it is importantthat the scheme be locally mass conservative, i.e., theintegral mean value of the velocity’s divergence be zeroin each cell.

Discretization of the Stokes ProblemLet Th be a triangulation of˝ made of tetrahedra (alsocalled elements) in R

3, the discretization parameter hbeing the maximum diameter of the elements. For ap-proximation purposes, Th is not completely arbitrary:


S

it is assumed to be shape regular in the sense thatits dihedral angles are uniformly bounded away from0 and � , and it has no hanging node in the sensethat the intersection of two cells is either empty, or avertex, or a complete edge, or a complete face. Fora given integer k � 0, let Pk denote the space ofpolynomials in three variables of total degree k andQk that of degree k in each variable. The accuracyof a finite-element space depends on the degree ofthe polynomials used in each cell; however, we shallconcentrate on low-degree elements, as these are mostfrequently used.

Let us start with locally mass conservative methodsand consider first conforming finite-element methods,i.e., where the finite-element space of discrete veloci-ties, sayXh, is contained inH1

0 .˝/3. Strictly speaking,

the space of discrete pressures should be containedin L2ı.˝/. However, the zero mean-value constraintdestroys the band structure of the matrix, and therefore,this constraint is prescribed weakly by means of asmall, consistent perturbation. Thus the space of dis-crete pressures, say Qh, is simply a discrete subspaceof L2.˝/, and problem (12) is discretized by: Find.uh; ph/ 2 Xh �Qh solution of

8.vh; qh/ 2 Xh �Qh ;

�.r uh;r vh/�.ph; div vh/� .qh; div uh/� ".ph; qh/

D hf ; vhi; (20)

where " > 0 is a small parameter. Let Mh D Qh \L2ı.˝/. Regardless of their individual accuracy, Xhand Mh cannot be chosen independently of each otherbecause they must satisfy a uniform discrete analogueof (14), namely,

infqh2Mh

supvh2Xh

.div vh; qh/

kr vhkL2.˝/kqhkL2.˝/� ˇ�; (21)

for some real number ˇ� > 0, independent of h.Elements that satisfy (21) are called inf-sup stable. Forsuch elements, the accuracy of (20) depends directly onthe individual approximation properties of Xh andQh.

Roughly speaking, (21) holds when a discrete ve-locity space is sufficiently rich compared to a givendiscrete pressure space. Observe also that the discretevelocity’s degree in each cell must be at least one

in order to guarantee continuity at the interfaces ofelements.

We begin with constant pressures with one degreeof freedom at the center of each tetrahedron. It can bechecked that, except on some very particular meshes,a conforming P1 velocity space is not sufficiently richto satisfy (21). This can be remedied by adding onedegree of freedom (a vector in the normal direction) atthe center of each face, and it is achieved by enrichingthe velocity space with one polynomial of P3 per face.This element, introduced by Bernardi and Raugel, isinf-sup stable and is of order one. It is frugal in numberof degrees of freedom and is locally mass conserva-tive but complex in its implementation because thevelocity components are not independent. Of course,three polynomials of P3 (one per component) can beused on each face, and thus each component of thediscrete velocity is the sum of a polynomial of P1,which guarantees accuracy, and a polynomial of P3,which guarantees inf-sup stability, but the element ismore expensive.

The idea of degrees of freedom on faces motivatesa nonconforming method where Xh is contained inL2.˝/3 and problem (12) is discretized by the follow-ing: Find .uh; ph/ 2 Xh �Qh solution of


�XT2Th

.r uh;r vh/T �XT2Th

.ph; div vh/T

�XT2Th

.qh; div uh/T � ".ph; qh/ D hf ; vhi: (22)

The inf-sup condition (21) is replaced by:

infqh2Mh

supvh2Xh

PT2Th .div vh; qh/T

kr vhkhkqhkL2.˝/� ˇ� where

k � kh D XT2Th

k � k2L2.˝/

�1=2: (23)

The simplest example, introduced by Crouzeix andRaviart, is that of a constant pressure and each veloc-ity’s component P1 per tetrahedron and each velocity’scomponent having one degree of freedom at the centerof each face. The functions of Xh must be continuousat the center of each interior face and vanish at the


center of each boundary face. Then it is not hard toprove that (23) is satisfied. Thus this element has orderone, it is fairly economical and mass conservative, andits implementation is fairly straightforward.

The above methods easily extend to hexahedraltriangulations with Cartesian structure (i.e., eight hexa-hedra meeting at any interior vertex) provided the poly-nomial space Pk is replaced by the inverse image of Qk

on the reference cube. Furthermore, such hexahedraltriangulations offer more possibilities. For instance, aconforming, inf-sup stable, locally mass conservativescheme of order two can be obtained by taking, in eachcell, a P1 pressure and each component of the velocityin Q2.

Now we turn to conforming methods that use con-tinuous discrete pressures; thus the pressure must be atleast P1 in each cell and continuous at the interfaces.Therefore the resulting schemes are not locally massconservative. It can be checked that velocities withP1 components are not sufficiently rich. The simplestalternative, called “mini-element” or “P1–bubble,” en-riches each velocity component in each cell with apolynomial of P4 that vanishes on the cell’s boundary,whence the name bubble. This element is inf-supstable and has order one. Its extension to order two,introduced in R

2 by Hood and Taylor, associates withthe same pressure, velocities with components in P2. Itis inf-sup stable and has order two.

Discretization of the Navier-Stokes SystemHere we present straightforward discretizations of (19).The simplest one consists in using a linearized back-ward Euler finite-difference scheme in time. LetN > 1

be an integer, ı t D T=N the corresponding time step,and tn D nı t the discrete times. Starting from a finite-element approximation or interpolation, say u0h of u0satisfying the discrete divergence constraint of (20), weconstruct a sequence .unh; p

nh/ 2 Xh �Qh such that for

1 � n � N :

8.vh; qh/ 2 Xh �Qh;

1

ı t.unh � un�1

h ; vh/C�.r unh;r vh/Cc.un�1h I unh; vh/

� .pnh; div vh/� .qh; div unh/� ".pnh; qh/Dhf n; vhi;(24)

where f n is an approximation of f .tn; �/ andc.whI uh; vh/ a suitable approximation of the

convection term .w � r u; v/. As (17) does notnecessarily extend to the discrete spaces, the preferredchoice, from the standpoint of theory, is

c.whI uh; vh/ D 1

2

h.wh � r uh; vh/� .wh � r vh;uh/

i;

(25)

because it is both consistent and antisymmetric, whichmakes the analysis easier. But from the standpoint ofnumerics, the choice

c.whI uh; vh/ D .wh � r uh; vh/ (26)

is simpler and seems to maintain the same accuracy.Observe that at each step n, (24) is a discrete Stokessystem with two or three additional linear terms, ac-cording to the choice of form c. In both cases, thematrix of the system is not symmetric, which is a strongdisadvantage. This can be remedied by completelytime lagging the form c, i.e., replacing it by .un�1

h �r un�1

h ; vh/.There are cases when none of the above lineariza-

tions are satisfactory, and the convection term is ap-proximated by c.unhI unh; vh/ with c defined by (25)or (26). The resulting scheme is nonlinear and must belinearized, for instance, by an inner loop of Newton’siterations. Recall Newton’s method for solving theequation f .x/ D 0 in R: Starting from an initial guessx0, compute the sequence .xk/ for k � 0 by

xkC1 D xk � f .xk/

f 0.xk/:

Its generalization is straightforward and at step n,starting from uh;0 D un�1

h , the inner loop reads: Find.uh;kC1; ph;kC1/ 2 Xh �Qh solution of

8.vh; qh/2Xh�Qh;1

ı t.uh;kC1; vh/C�.r uh;kC1;r vh/

C c.uh;kC1I uh;k ; vh/C c.uh;k I uh;kC1; vh/

� .ph;kC1; div vh/� .qh; div uh;kC1/� ".ph;kC1; qh/

D c.uh;k I uh;k; vh/C hf n; vhi C 1

ı t.un�1h ; vh/:

(27)

Experience shows that only a few iterations are suffi-cient to match the discretization error. Once this innerloop converges, we set unh WD uh;kC1, pnh WD ph;kC1.


S

An interesting alternative to the above schemes isthe characteristics method that uses a discretization ofthe material time derivative (see (3)):

d

dtu.tn;x/ ' 1

ı t

�unh.x/� un�1

h .�n�1.x//�;

where �n�1.x/ gives the position at time tn�1 of a parti-cle located at x at time tn. Its first-order approximationis

�n�1.x/ D x � .ı t/un�1h .x/:

Thus (24) is replaced by

8.vh; qh/ 2 Xh �Qh;

1

ı t

�unh � un�1

h ı �n�1; vh�C �.r unh;r vh/

� .pnh; div vh/� .qh; div unh/� ".pnh; qh/Dhf n; vhi;(28)

whose matrix is symmetric and constant in time andrequires no linearization. On the other hand, computingthe right-hand side is more complex.

Algorithms

In this section we assume that the discrete spacesare inf-sup stable, and to simplify, we restrictthe discussion to conforming discretizations. Anydiscretization of the time-dependent Navier-Stokessystem requires the solution of at least one Stokesproblem per time step, whence the importance of anefficient Stokes solver. But since the matrix of thediscrete Stokes system is large and indefinite, in R

3 thesystem is rarely solved simultaneously for uh and ph.Instead the computation of ph is decoupled from thatof uh.

Decoupling the Pressure and VelocityLet U be the vector of velocity unknowns, P that ofpressure unknowns, and F the vector of data repre-sented by .f ; vh/. Let A be the (symmetric positivedefinite) matrix of the discrete Laplace operator repre-sented by �.r uh;r vh/, B the matrix of the discretedivergence operator represented by .qh; div uh/, and C

the matrix of the operator represented by ".ph; qh/.Owing to (21), the matrix B has maximal rank. Withthis notation, (20) has the form:

A U � BTP D F ; �B U � C P D 0; (29)

whose matrix is symmetric but indeed indefinite. SinceA is nonsingular, a partial solution of (29) is

�BA�1BT C C

�P D �BA�1F ;

U D A�1.F C BTP/: (30)

As B has maximal rank, the Schur complementBA�1BT C C is symmetric positive definite, andan iterative gradient algorithm is a good candidate forsolving (30). Indeed, (30) is equivalent to minimizingwith respect to Q the quadratic functional

K.Q/ D 1

2

�AvQ; vQ

�C 1

2

�C Q;Q

�; with

AvQ D F C BTQ: (31)

A variety of gradient algorithms for approximatingthe minimum are obtained by choosing a sequenceof direction vectors W k and an initial vector P0 andcomputing a sequence of vectors Pk defined for eachk � 1 by:

Pk D Pk�1 � �k�1W k�1; where

K.Pk�1 � �k�1W k�1/ D inf�2RK.Pk�1 � �W k�1/:

(32)

Usually the direction vectors W k are related to thegradient ofK , whence the name of gradient algorithms.It can be shown that each step of these gradient al-gorithms requires the solution of a linear system withmatrix A, which is equivalent to solving a Laplaceequation per step. This explains why solving the Stokessystem is expensive.

The above strategy can be applied to (28) but notto (24) because its matrix A is no longer symmetric. Inthis case, a GMRES algorithm can be used, but thisalgorithm is expensive. For this reason, linearizationby fully time lagging c may be preferable becausethe matrix A becomes symmetric. Of course, whenNewton’s iterations are performed, as in (27), thisoption is not available because A is not symmetric. Inthis case, a splitting strategy may be useful.

Splitting AlgorithmsThere is a wide variety of algorithms for splitting thenonlinearity from the divergence constraint. Here is an


example where the divergence condition is enforcedonce every other step. At step n,1. Knowing .un�1

h ; pn�1h / 2 Xh �Qh, compute an in-

termediate velocity .wnh; pnh/ 2 Xh �Qh solution of


1

ı t.wnh � un�1

h ; vh/ � .pnh; div vh/�.qh; div wnh/

� ".pnh; qh/ D hf n; vhi � �.r un�1h ;r vh/

� c.un�1h I un�1

h ; vh/: (33)

2. Compute unh 2 Xh solution of

8vh 2 Xh ; 1ı t.unh � un�1

h ; vh/C �.r unh;r vh/

C c.wnhI unh; vh/ D hf n; vhi C .pnh; div vh/: (34)

The first step is fairly easy because it reduces to a“Laplace” operator with unknown boundary conditions

and therefore can be preconditioned by a Laplaceoperator, while the second step is an implicit linearizedsystem without constraint.

Numerical Experiments

We present here two numerical experiments of bench-marks programmed with the software FreeFem++.More details including scripts and plots can be foundonline at http://www.ljll.math.upmc.fr/hecht/ftp/ECM-2013.

The Driven Cavity in a SquareWe use the Taylor-Hood P2 � P1 scheme to solve thesteady Navier-Stokes equations in the square cavity˝ D�0; 1Œ��0; 1Œ with upper boundary�1 D�0; 1Œ�f1g:

� 1

Re� u C u � r u C r p D 0 ; div u D 0;

uj�1 D .1; 0/ ; uj@˝n�1 D .0; 0/;

Stokes or Navier-Stokes Flows, Fig. 1 From left to right: pressure at Re 9;000, adapted mesh at Re 8;000, stream function at Re9;000. Observe the cascade of corner eddies

Stokes or Navier-Stokes Flows, Fig. 2 Von Karman’s vortex street

http://www.ljll.math.upmc.fr/~hecht/ftp/ECM-2013



S

with different values of Re ranging from 1 to9;000. The discontinuity of the boundary valuesat the two upper corners of the cavity produces asingularity of the pressure. The nonlinearity is solvedby Newton’s method, the initial guess being obtainedby continuation on the Reynolds number, i.e., fromthe solution computed with the previous Reynoldsnumber. The address of the script is cavityNewton.edp(Fig. 1).

Flow Past an Obstacle: Von Karman’s VortexStreet in a RectangleThe Taylor-Hood P2 � P1 scheme in space andcharacteristics method in time are used to solve thetime-dependent Navier-Stokes equations in a rectangle2:2 � 0:41m with a circular hole of diameter 0:1mlocated near the inlet. The density � D 1:0

Kgm3 and the

kinematic viscosity � D 10�3 m2

s . All the relevant dataare taken from the benchmark case 2D-2 that can befound online at http://www.mathematik.tu-dortmund.de/lsiii/cms/papers/SchaeferTurek1996.pdf. Theaddress of the script is at http://www.ljll.math.upmc.fr/hecht/ftp/ECM-2013 is NSCaraCyl-100-mpi.edpor NSCaraCyl-100-seq.edp and func-max.idp (Fig. 2).

Bibilographical Notes

The bibliography on Stokes and Navier-Stokes equa-tions, theory and approximation, is very extensive andwe have only selected a few references.

A mechanical derivation of the Navier-Stokes equa-tions can be found in the book by L.D. Landau andE.M. Lifshitz:

Fluid Mechanics, Second Edition, Vol. 6 (Course ofTheoretical Physics), Pergamon Press, 1959.

The reader can also refer to the book by C. Truesdelland K.R. Rajagopal:

An Introduction to the Mechanics of Fluids, Model-ing and Simulation in Science, Engineering and Tech-nology, Birkhauser, Basel, 2000.

A thorough theory and description of finite elementmethods can be found in the book by P.G. Ciarlet:

Basic error estimates for elliptic problems - FiniteElement Methods, Part 1, in Handbook of NumericalAnalysis, II, P.G. Ciarlet and J.L. Lions, eds., North-Holland, Amsterdam, 1991.

The reader can also refer to the book by T. Oden andJ.N. Reddy:

An introduction to the mathematical theory of finiteelements, Wiley, New-York, 1976.

More computational aspects can be found in thebook by A. Ern and J.L. Guermond:

Theory and Practice of Finite Elements, AMS 159,Springer-Verlag, Berlin, 2004.

The reader will find an introduction to the theoryand approximation of the Stokes and steady Navier-Stokes equations, including a thorough discussion onthe inf-sup condition, in the book by V. Girault andP.A. Raviart:

Finite Element Methods for Navier-StokesEquations. Theory and Algorithms, SCM 5, Springer-Verlag, Berlin, 1986.

An introduction to the theory and approximation ofthe time-dependent Navier-Stokes problem is treated inthe Lecture Notes by V. Girault and P.A. Raviart:

Finite Element Approximation of the Navier-StokesEquations, Lect. Notes in Math. 749, Springer-Verlag,Berlin, 1979.

Non-conforming finite elements can be found in thereference by M. Crouzeix and P.A. Raviart:

Conforming and non-conforming finite elementmethods for solving the stationary Stokes problem,RAIRO Anal. Numer. 8 (1973), pp. 33–76.

We also refer to the book by R. Temam:Navier-Stokes Equations, Theory and Numerical

Analysis, North-Holland, Amsterdam, 1979.The famous Millennium Prize Problem is described

at the URL:http://www.claymath.org/millennium/Navier-Stokes

Equations.The reader will find a wide range of numerical

methods for fluids in the book by O. Pironneau:Finite Element Methods for Fluids, Wiley, 1989.

See also http://www.ljll.math.upmc.fr/pironneau.We also refer to the course by R. Rannacher avail-

able online at the URL:http://numerik.iwr.uni-heidelberg.de/Oberwolfach-

Seminar/CFD-Course.pdf.The book by R. Glowinski proposes a vey extensive

collection of numerical methods, algorithms, and ex-periments for Stokes and Navier-Stokes equations:

Finite Element Methods for Incompressible ViscousFlow, in Handbook of numerical analysis, IX, P.G. Cia-rlet and J.L .Lions, eds., North-Holland, Amsterdam,2003.

http://cavityNewton.edp

http://www.mathematik.tu-dortmund.de/lsiii/cms/papers/SchaeferTurek1996.pdf

http://www.mathematik.tu-dortmund.de/lsiii/cms/papers/SchaeferTurek1996.pdf



http://www.claymath.org/millennium/Navier-Stokes Equations

http://www.ljll.math.upmc.fr/~pironneau

http://numerik.iwr.uni-heidelberg.de/Oberwolfach-Seminar/CFD-Course.pdf

http://numerik.iwr.uni-heidelberg.de/Oberwolfach-Seminar/CFD-Course.pdf

1418 Stratosphere and Its Coupling to the Troposphere and Beyond

Stratosphere and Its Coupling to theTroposphere and Beyond

Edwin P. GerberCenter for Atmosphere Ocean Science, CourantInstitute of Mathematical Sciences, New YorkUniversity, New York, NY, USA

Synonyms

Middle atmosphere

Glossary

Mesosphere an atmospheric layer between approxi-mately 50 and 100 km height

Middle atmosphere a region of the atmosphere in-cluding the stratosphere and mesosphere

Stratosphere an atmospheric layer between approxi-mate 12 and 50 km

Stratospheric polar vortex a strong, circumpolarjet that forms in the extratropical stratosphereduring the winter season in each respectivehemisphere

Sudden stratospheric warming a rapid break downof the stratospheric polar vortex, accompanied by asharp warming of the polar stratosphere

Tropopause boundary between the troposphere andstratosphere, generally between 10 to 18 km.

Troposphere lowermost layer of the atmosphere,extending from the surface to between 10 and 18km.

Climate engineering the deliberate modification ofthe Earth’s climate system, primarily aimed at re-ducing the impact of global warming caused byanthropogenic greenhouse gas emissions

Geoengineering see climate engineeringQuasi-biennial oscillation an oscillating pattern of

easterly and westerly jets which propagates down-ward in the tropical stratosphere with a slightlyvarying period around 28 months

Solar radiation management a form of climate en-gineering where the net incoming solar radiation tothe surface is reduced to offset warming caused bygreenhouse gases

Definition

As illustrated in Fig. 1, the Earth’s atmosphere canbe separated into distinct regions, or “spheres,” basedon its vertical temperature structure. In the lowermostpart of the atmosphere, the troposphere, the temper-ature declines steeply with height at an average rateof approximately 7 ıC per kilometer. At a distinctlevel, generally between 10–12 km in the extratropicsand 16–18 km in the tropics (The separation betweenthese regimes is rather abrupt and can be used as adynamical indicator delineating the tropics and ex-tratropics.) this steep descent of temperature abruptlyshallows, transitioning to a layer of the atmospherewhere temperature is initially constant with height, andthen begins to rise. This abrupt change in the verticaltemperature gradient, denoted the tropopause, marksthe lower boundary of the stratosphere, which extendsto approximately 50 km in height, at which point thetemperature begins to fall with height again. The re-gion above is denoted the mesosphere, extending to asecond temperature minimum between 85 and 100 km.Together, the stratosphere and mesosphere constitutethe middle atmosphere.

The stratosphere was discovered at the dawn of thetwentieth century. The vertical temperature gradient, orlapse rate, of the troposphere was established in theeighteenth century from temperature and pressure mea-surements taken on alpine treks, leading to speculationthat the air temperature would approach absolute zerosomewhere between 30 and 40 km: presumably the topof the atmosphere. Daring hot air balloon ascents in thelate nineteenth century provided hints at a shallowingof the lapse rate – early evidence of the tropopause –but also led to the deaths of aspiring upper-atmospheremeteorologists. Teisserenc de Bort [15] and Assmann[2], working outside of Paris and Berlin, respectively,pioneered the first systematic, unmanned balloon ob-servations of the upper atmosphere, establishing thedistinct changes in the temperature structure that markthe stratosphere.

Overview

The lapse rate of the atmosphere reflects the stability ofthe atmosphere to vertical motion. In the troposphere,the steep decline in temperature reflects near-neutralstability to moist convection. This is the turbulent

Stratosphere and Its Coupling to the Troposphere and Beyond 1419

S

temperature (°C)

pres

sure

(hP

a)

tropopause

stratopause

mesopause

TROPOSPHERE

STRATOSPHERE

MESOSPHERE

THERMOSPHERE

−80 −60 −40 −20 0 20

10−5

10−4

10−3

10−2

10−1

10−0

101

103

102

0

20

40

60

80

100

120

heig

ht (

km)

Stratosphere and Its Coupling to the Troposphere andBeyond, Fig. 1 The vertical temperature structure of the at-mosphere. This sample profile shows the January zonal meantemperature at 40A N from the Committee on Space Research(COSPAR) International Reference Atmosphere 1986 model(CIRA-86). The changes in temperature gradients and hencestratification of the atmosphere reflect a difference in the dynam-ical and radiative processes active in each layer. The heights ofthe separation points (tropopause, stratopause, and mesopause)vary with latitude and season – and even on daily time scales dueto dynamical variability – but are generally sharply defined inany given temperature profile

weather layer of the atmosphere where air is in closecontact with the surface, with a turnover time scale onthe order of days. In the stratosphere, the near-zero orpositive lapse rates strongly stratify the flow. Here theair is comparatively isolated from the surface of theEarth, with a typical turnover time scale on the orderof a year or more. This distinction in stratification andresulting impact on the circulation are reflected in thenomenclature: the “troposphere” and “stratosphere”were coined by Teisserenc de Bort, the former the“sphere of change” from the Greek tropos, to turn orwhirl while the latter the “sphere of layers” from theLatin stratus, to spread out.

In this sense, the troposphere can be thought ofas a boundary layer in the atmosphere that is wellconnected to the surface. This said, it is important tonote that the mass of the atmosphere is proportionalto the pressure: the tropospheric “boundary” layerconstitutes roughly 85 % of the mass of the atmo-sphere and contains all of our weather. The strato-sphere contains the vast majority of the remainingatmospheric mass and the mesosphere and layers abovejust 0.1 %.

Why Is There a Stratosphere?The existence of the stratosphere depends on the ra-diative forcing of the atmosphere by the Sun. As theatmosphere is largely transparent to incoming solarradiation, the bulk of the energy is absorbed at thesurface. The presence of greenhouse gases, whichabsorb infrared light, allows the atmosphere to interactwith radiation emitted by the surface. If the atmospherewere “fixed,” and so unable to convect (described asa radiative equilibrium), this would lead to an unsta-ble situation, where the air near the surface is muchwarmer – and so more buoyant – than that above it.At height, however, temperature eventually becomesisothermal, given a fairly uniform distribution of theinfrared absorber throughout the atmosphere (The sim-plest model for this is the so-called gray radiationscheme, where one assumes that all solar radiationis absorbed at the surface and a single infrared bandfrom the Earth interacts with a uniformly distributedgreenhouse gas.).

If we allow the atmosphere to turn over in thevertical or convect in the nomenclature of atmosphericscience, the circulation will produce to a well-mixedlayer at the bottom with near-neutral stability: thetroposphere. The energy available to the air at thesurface is finite, however, only allowing it to penetrateso high into the atmosphere. Above the convection willsit the stratified isothermal layer that is closer to theradiative equilibrium: the stratosphere. This simplifiedview of a radiative-convective equilibrium obscuresthe role of dynamics in setting the stratification inboth the troposphere and stratosphere but conveys theessential distinction between the layers. In this respect,“stratospheres” are found on other planets as well,marking the region where the atmosphere becomesmore isolated from the surface.

The increase in temperature seen in the Earth’sstratosphere (as seen in Fig. 1) is due to the fact that


the atmosphere does interact with the incoming solarradiation through ozone. Ozone is produced by theinteraction between molecular oxygen and ultravioletradiation in the stratosphere [4] and takes over asthe dominant absorber of radiation in this band. Thedecrease in density with height leads to an optimal levelfor net ultraviolet warming and hence the temperaturemaximum near the stratopause, which provides thedemarcation for the mesosphere above.

Absorption of ultraviolet radiation by stratosphericozone protects the surface from high-energy radiation.The destruction of ozone over Antarctica by halo-genated compounds has had significant health impacts,in addition to damaging all life in the biosphere.As described below, it has also had significant im-pacts on the tropospheric circulation in the SouthernHemisphere.

Compositional DifferencesThe separation in the turnover time scale between thetroposphere and stratosphere leads to distinct chemicalor compositional properties of air in these two regions.Indeed, given a sample of air randomly taken fromsome point in the atmosphere, one can easily tellwhether it came from the troposphere or the strato-sphere. The troposphere is rich in water vapor andreactive organic molecules, such as carbon monoxide,which are generated by the biosphere and anthro-pogenic activity. Stratospheric air is extremely dry,with an average water vapor concentration of approx-imate 3–5 parts per billion, and comparatively rich inozone. Ozone is a highly reactive molecule (causinglung damage when it is formed in smog at the surface)and does not exist for long in the troposphere.

Scope and Limitations of this EntryStratospheric research, albeit only a small part of theEarth system science, is a fairly mature field coveringa wide range of topics. The remaining goal of this briefentry is to highlight the dynamical interaction betweenthe stratosphere and the troposphere, with particularemphasis on the impact of the stratosphere on surfaceclimate. In the interest of brevity, references have beenkept to a minimum, focusing primarily on seminalhistorical papers and reviews. More detailed referencescan be found in the review articles listed in furtherreadings.

The stratosphere also interacts with the tropospherethrough the exchange of mass and trace chemical

species, such as ozone. This exchange is critical forunderstanding the atmospheric chemistry in both thetroposphere and stratosphere and has significant impli-cations for tropospheric air quality, but will not be dis-cussed. For further information, please see two reviewarticles, [8] and [11]. The primary entry point for airinto the stratosphere is through the tropics, where theboundary between the troposphere and stratosphere isless well defined. This region is known as the tropicaltropopause layer and a review by [6] will provide thereader an introduction to research on this topic.

Dynamical Coupling Between theStratosphere and Troposphere

The term “coupling” suggests interactions betweenindependent components and so begs the question asto whether the convenient separation of the atmosphereinto layers is merited in the first place. The key dy-namical distinction between the troposphere and strato-sphere lies in the differences in their stratification andthe fact that moist processes (i.e., moist convection andlatent heat transport) are restricted to the troposphere.The separation between the layers is partly historical,however, evolving in response to the development ofweather forecasting and the availability of computa-tional resources.

Midlatitude weather systems are associated withlarge-scale Rossby waves, which owe their existenceto gradients in the effective rotation, or vertical com-ponent of vorticity, of the atmosphere due to varia-tions in the angle between the surface plain and theaxis of rotation with latitude. Pioneering work by [5]and [10] showed that the dominant energy containingwaves in the troposphere, wavenumber roughly 4–8,the so-called synoptic scales, cannot effectively propa-gate into the stratosphere due the presence of easterlywinds in the summer hemisphere and strong westerlywinds in the winter hemisphere. For the purposes ofweather prediction, then, the stratosphere could largelybe viewed as an upper-boundary condition. Modelsthus resolved the stratosphere as parsimoniously aspossible in order to focus numerical resources on thetroposphere. The strong winds in the winter strato-sphere also impose a stricter Courant-Friedrichs-Lewycondition on the time step of the model, althoughmore advanced numerical techniques have alleviatedthis problem.


S

Despite the dynamical separation for weathersystem-scale waves, larger-scale Rossby waves(wavenumber 1–3, referred to as planetary scales)can penetrate into the winter stratosphere, allowing formomentum exchange between the layers. In addition,smaller-scale (on the order of 10–1,000 km) gravitywaves (Gravity waves are generated in stratifiedfluids, where the restoring force is the gravitationalacceleration of fluid parcels or buoyancy. Theyare completely distinct from relativistic gravitywaves.) also transport momentum between thelayers. As computation power increased, leadingto a more accurate representation of troposphericdynamics, it became increasingly clear that a betterrepresentation of the stratosphere was necessary tofully understand and simulate surface weather andclimate.

Coupling on Daily to Intraseasonal Time ScalesWeather prediction centers have found that the in-creased representation of the stratosphere improvestropospheric forecasts. On short time scales, however,much of the gain comes from improvements to the tro-pospheric initial condition. This stems from better as-similation of satellite temperature measurements whichproject onto both the troposphere and stratosphere.

The stratosphere itself has a more prominent impacton intraseasonal time scales, due to the intrinsicallylonger time scales of variability in this region of theatmosphere. The gain in predictability, however, is con-ditional, depending on the state of the stratosphere. Un-der normal conditions, the winter stratosphere is verycold in the polar regions, associated with a strong west-erly jet, or stratospheric polar vortex. As first observedin the 1950s [12], this strong vortex is sometimesdisturbed by the planetary wave activity propagatingbelow, leading to massive changes in temperature (upto 70 ıC in a matter of days) and a reversal of thewesterly jet, a phenomenon known as a sudden strato-spheric warming, or SSW. While the predictability ofSSWs are limited by the chaotic nature of troposphericdynamics, after an SSW the stratosphere remains in analtered state for up to 2–3 months as the polar vortexslowly recovers from the top down.

Baldwin and Dunkerton [3] demonstrated the im-pact of these changes on the troposphere, showing thatan abrupt warming of the stratosphere is followed by anequatorward shift in the tropospheric jet stream and as-sociated storm track. An abnormally cold stratosphere

is conversely associated with a poleward shift in thejet stream, although the onset of cold vortex events isnot as abrupt. More significantly, the changes in thetroposphere extend for up to 2–3 months on the slowtime scale of the stratospheric recovery, while undernormal conditions the chaotic nature of troposphericflow restricts the time scale of jet variations to ap-proximately 10 days. The associated changes in thestratospheric jet stream and tropospheric jet shift areconveniently described by the Northern Annular Mode(The NAM is also known as the Arctic Oscillation,although the annular mode nomenclature has becomemore prominent.) (NAM) pattern of variability.

The mechanism behind this interaction is still anactive area of research. It has become clear, however,that key lies in the fact that the lower stratosphereinfluences the formation and dissipation of synoptic-scale Rossby waves, despite the fact that these wavesdo not penetrate far into the stratosphere.

A shift in the jet stream is associated with a large-scale rearrangement of tropospheric weather patterns.In the Northern Hemisphere, where the stratosphereis more variable due to the stronger planetary waveactivity (in short, because there are more continents),an equatorward shift in the jet stream following anSSW leads to colder, stormier weather over much ofnorthern Europe and eastern North America. Forecastskill of temperature, precipitation, and wind anomaliesat the surface increases in seasonal forecasts follow-ing an SSW. SSWs can be further differentiated into“vortex displacements” and “vortex splits,” dependingon the dominant wavenumber (1 or 2, respectively)involved in the breakdown of the jet, and recent workhas suggested this has an effect on the troposphericimpact of the warming.

SSWs occur approximately every other year in theNorthern Hemisphere, although there is strong inter-mittency: few events were observed in the 1990s, whilethey have been occurring in most years in the firstdecades of the twenty-first century. In the SouthernHemisphere, the winter westerlies are stronger and lessvariable – only one SSW has ever been observed, in2002 – but predictability may be gained around thetime of the “final warming,” when the stratospheretransitions to it’s summer state with easterly winds.Some years, this transition is accelerated by planetarywave dynamics, as in an SSW, while in other years it isgradual, associated with a slow radiative relaxation tothe summer state.


Coupling on Interannual Time ScalesOn longer time scales, the impact of the stratosphereis often felt through a modulation of the intraseasonalcoupling between the stratospheric and troposphericjet streams. Stratospheric dynamics play an importantrole in internal modes of variability to the atmosphere-ocean system, such as El Nino and the Southern Os-cillation (ENSO), and in the response of the climatesystem to “natural” forcing by the solar cycle andvolcanic eruptions.

The quasi-biennial oscillation (QBO) is a nearlyperiodic oscillation of downward propagating easterlyand westerly tropical jets in the tropical stratosphere,with a period of approximately 28 months. It is perhapsthe most long-lived mode of variability intrinsic to theatmosphere alone. The QBO influences the surface bymodulating the wave coupling between the troposphereand stratosphere in the Northern Hemisphere winter,altering the frequency and intensity of SSWs depend-ing on the phase of the oscillation.

Isolating the impact of the QBO has been com-plicated by the possible overlap with the ENSO, acoupled mode of atmosphere-ocean variability with atime scale of approximately 3–7 years. The relativelyshort observational record makes it difficult to untanglethe signals from measurements alone, and modelshave only recently been able to simulate these phe-nomenon with reasonable accuracy. ENSO is drivenby interaction between the tropical Pacific Ocean andthe zonal circulation of the tropical atmosphere (theWalker circulation). Its impact on the extratropicalcirculation in the Northern Hemisphere, however, is inpart effected through its influence on the stratosphericpolar vortex. A warm phase of ENSO is associated withstronger planetary wave propagation into the strato-sphere, hence a weaker polar vortex and equatorwardshift in the tropospheric jet stream.

Further complicating the statistical separation be-tween the impacts of ENSO and the QBO is theinfluence of the 11-year solar cycle, associated withchanges in the number of sunspots. While the overallintensity of solar radiation varies less than 0.1 % ofits mean value over the cycle, the variation is strongerin the ultraviolet part of the spectrum. Ultravioletradiation is primarily absorbed by ozone in the strato-sphere, and it has been suggested that the associatedchanges in temperature structure alter the planetarywave propagation, along the lines of the influence ofENSO and QBO.

The role of the stratosphere in the climate responseto volcanic eruptions is comparatively better under-stood. While volcanic aerosols are washed out of thetroposphere on fairly short time scales by the hydrolog-ical cycle, sulfate particles in the stratosphere can lastfor 1–2 years. These particles reflect the incoming solarradiation, leading to a global cooling of the surface;following Pinatubo, the global surface cooled to 0.1–0.2 K. The overturning circulation of the stratospherelifts mass up into the tropical stratosphere, transportingit poleward where it descends in the extratropics. Thus,only tropical eruptions have a persistent, global impact.

Sulfate aerosols warm the stratosphere, thereforemodifying the planetary wave coupling. There is someevidence that the net result is a strengthening of thepolar vortex which in turn drives a poleward shift inthe tropospheric jets. Hence, eastern North Americaand northern Europe may experience warmer wintersfollowing eruptions, despite the overall cooling impactof the volcano.

The Stratosphere and Climate ChangeAnthropogenic forcing has changed the stratosphere,with resulting impacts on the surface. While green-house gases warm the troposphere, they increase theradiative efficiency of the stratosphere, leading to a netcooling in this part of the atmosphere. The combinationof a warming troposphere and cooling stratosphereleads to a rise in the tropopause and may be one ofthe most identifiable signatures of global warming onthe atmospheric circulation.

While greenhouse gases will have a dominant long-term impact on the climate system, anthropogenicemissions of halogenated compounds, such as chlo-rofluorocarbons (CFCs), have had the strongest im-pact on the stratosphere in recent decades. Halogenshave caused some destruction of ozone throughout thestratosphere, but the extremely cold temperatures of theAntarctic stratosphere in winter permit the formationof polar stratospheric clouds, which greatly acceleratethe production of Cl and Br atoms that catalyze ozonedestruction (e.g., [14]). This led to the ozone hole,the effective destruction of all ozone throughout themiddle and lower stratosphere over Antarctica. Theeffect of ozone loss on ultraviolet radiation was quicklyappreciated, and the use of halogenated compoundsregulated and phased out under the Montreal Proto-col (which came into force in 1989) and subsequentagreements. Chemistry climate models suggest that


S

the ozone hole should recover by the end of thiscentury, assuming the ban on halogenated compoundsis observed.

It was not appreciated until the first decade ofthe twenty-first century, however, that the ozone holealso has impacted the circulation of the SouthernHemisphere. The loss of ozone leads to a cooling ofthe austral polar vortex in springtime and a subsequentpoleward shift in the tropospheric jet stream. Notethat this poleward shift in the tropospheric jet inresponse to a stronger stratospheric vortex mirrorsthe coupling associated with natural variability inthe Northern Hemisphere. As reviewed by [16],this shift in the jet stream has had significantimpacts on precipitation across much of the SouthernHemisphere.

Stratospheric trends in water vapor also have the po-tential to affect the surface climate. Despite the minus-cule concentration of water vapor in the stratosphere(just 3–5 parts per billion), the radiative impact of agreenhouse gases scales logarithmically, so relativelylarge changes in small concentrations can have a strongimpact. Decadal variations in stratospheric water vaporcan have an influence on surface climate comparable todecadal changes in greenhouse gas forcing, and there isevidence of a positive feedback of stratospheric watervapor on greenhouse gas forcing.

The stratosphere has also been featured prominentlyin the discussion of climate engineering (or geoengi-neering), the deliberate alteration of the Earth sys-tem to offset the consequences of greenhouse-inducedwarming. Inspired by the natural cooling impact ofvolcanic aerosols, the idea is to inject hydrogen sulfideor sulfur dioxide into the stratosphere, where it willform sulfate aerosols. To date, this strategy of theso-called solar radiation management appears to beamong the most feasible and cost-effective means ofcooling the Earth’s surface, but it comes with manydangers. In particular, it does not alleviate ocean acid-ification, and the effect is short-lived – a maximum oftwo years – and so would require continual action adinfinitum or until greenhouse gas concentrations werereturned to safer levels. (In saying this, it is importantto note that the natural time scale for carbon dioxideremoval is 100,000s of years, and there are no knownstrategies for accelerating CO2 removal that appearfeasible, given current technology.) In addition, theimpact of sulfate aerosols on stratospheric ozone andthe potential regional effects due to changes in the

planetary wave coupling with the troposphere are notwell understood.

Further Reading

There are a number of review papers on stratosphere-tropospheric coupling in the literature. In particular,[13] provides a comprehensive discussion ofstratosphere-troposphere coupling, while [7] highlightsdevelopments in the last decade. Andrews et al.[1] provide a classic text on the dynamics of thestratosphere, and [9] provides a wider perspectiveon the stratosphere, including the history offield.

References

1. Andrews, D.G., Holton, J.R., Leovy, C.B.: Middle At-mosphere Dynamics. Academic Press, Waltham, MA(1987)

2. Assmann, R.A.: uber die existenz eines warmeren Luft-tromes in der Hohe von 10 bis 15 km. Sitzungsber K. Preuss.Akad. Wiss. 24, 495–504 (1902)

3. Baldwin, M.P., Dunkerton, T.J.: Stratospheric harbingers ofanomalous weather regimes. Science 294, 581–584 (2001)

4. Chapman, S.: A theory of upper-atmosphere ozone. Mem.R. Meteor. Soc. 3, 103–125 (1930)

5. Charney, J.G., Drazin, P.G.: Propagation of planetary-scaledisturbances from the lower into the upper atmosphere. J.Geophys. Res. 66, 83–109 (1961)

6. Fuegistaler, S., Dessler, A.E., Dunkerton, T.J., Folkins, I.,Fu, Q., Mote, P.W.: Tropical tropopause layer. Rev. Geo-phys. 47, RG1004 (2009)

7. Gerber, E.P., Butler, A., Calvo, N., Charlton-Perez, A.,Giorgetta, M., Manzini, E., Perlwitz, J., Polvani, L.M.,Sassi, F., Scaife, A.A., Shaw, T.A., Son, S.W., Watanabe,S.: Assessing and understanding the impact of stratosphericdynamics and variability on the Earth system. Bull. Am.Meteor. Soc. 93, 845–859 (2012)

8. Holton, J.R., Haynes, P.H., McIntyre, M.E., Douglass, A.R.,Rood, R.B., Pfister, L.: Stratosphere-troposphere exchange.Rev. Geophys. 33, 403–439 (1995)

9. Labitzke, K.G., Loon, H.V.: The Stratosphere: Phenom-ena, History, and Relevance. Springer, Berlin/New York(1999)

10. Matsuno, T.: Vertical propagation of stationary planetarywaves in the winter Northern Hemisphere. J. Atmos. Sci.27, 871–883 (1970)

11. Plumb, R.A.: Stratospheric transport. J. Meteor. Soc. Jpn.80, 793–809 (2002)

12. Scherhag, R.: Die explosionsartige stratospharenerwarmungdes spatwinters 1951/52. Ber Dtsch Wetterd. US Zone 6,51–63 (1952)

1424 Structural Dynamics

13. Shepherd, T.G.: Issues in stratosphere-troposphere cou-pling. J. Meteor. Soc. Jpn, 80, 769–792 (2002)

14. Solomon, S.: Stratospheric ozone depletion: a review ofconcepts and history. Rev. Geophys. 37(3), 275–316 (1999).doi:10.1029/1999RG900008

15. Teisserence de Bort, L.: Variations de la temperature d l’airlibre dans la zone comprise entre 8 km et 13 km d’altitude.C. R. Acad. Sci. Paris 138, 42–45 (1902)

16. Thompson, D.W.J., Solomon, S., Kushner, P.J., England,M.H., Grise, K.M., Karoly, D.J.: Signatures of the Antarcticozone hole in Southern Hemisphere surface climate change.Nature Geoscience 4, 741–749 (2011). doi:10.1038/N-GEO1296

Structural Dynamics

Roger Ohayon1 and Christian Soize21Structural Mechanics and Coupled SystemsLaboratory, LMSSC, Conservatoire National des Artset Metiers (CNAM), Paris, France2Laboratoire Modelisation et SimulationMulti-Echelle, MSME UMR 8208 CNRS, UniversiteParis-Est, Marne-la-Vallee, France

Description

The computational structural dynamics is devoted tothe computation of the dynamical responses in time orin frequency domains of complex structures, submittedto prescribed excitations. The complex structure isconstituted of a deformable medium constituted ofmetallic materials, heterogeneous composite materials,and more generally, of metamaterials.

This chapter presents the linear dynamic analysisfor complex structures, which is the most frequent caseencountered in practice. For this situation, one of themost efficient modeling strategy is based on a formu-lation in the frequency domain (structural vibrations).There are many advantages to use a frequency domainformulation instead of a time domain formulation be-cause the modeling can be adapted to the nature of thephysical responses which are observed. This is the rea-son why the low-, the medium-, and the high-frequencyranges are introduced. The different types of vibrationresponses of a linear dissipative complex structure leadus to define the frequency ranges of analysis. Letuj .x; !/ be the Frequency Response Function (FRF) ofa component j of the displacement u.x; !/, at a fixed

point x of the structure and at a fixed circular frequency! (in rad/s). Figure 1 represents the modulus juj .x; !/jin log scale and the unwrapped phase 'j .x; !/ of theFRF such that uj .x; !/ D juj .x; !/j expf�i'j .x; !/g.The unwrapped phase is defined as a continuous func-tion of ! obtained in adding multiples of ˙2� forjumps of the phase angle. The three frequency rangescan then be characterized as follows:1. The low-frequency range (LF) is defined as the

modal domain for which the modulus of the FRFexhibits isolated resonances due to a low modaldensity of elastic structural modes. The amplitudesof the resonances are driven by the damping and thephase rotates of � at the crossing of each isolatedresonance (see Fig. 1). For the LF range, the strategyused consists in computing the elastic structuralmodes of the associated conservative dynamicalsystem and then to construct a reduced-order modelby the Ritz-Galerkin projection. The resulting ma-trix equation is solved in the time domain or inthe frequency domain. It should be noted that sub-structuring techniques can also be introduced forcomplex structural systems. Those techniques con-sist in decomposing the structure into substructuresand then in constructing a reduced-order model foreach substructure for which the physical degrees offreedom on the coupling interfaces are kept.

2. The high-frequency range (HF) is defined as therange for which there is a high modal density whichis constant on the considered frequency range. Inthis HF range the modulus of the FRF varies slowlyas the function of the frequency and the phase isapproximatively linear (see Fig. 1). Presently, thisfrequency range is relevant of various approachessuch as Statistical Energy Analysis (SEA), diffusionof energy equation, and transport equation. How-ever, due to the constant increase of computer powerand advances in modeling of complex mechanicalsystems, this frequency domain becomes more andmore accessible to the computational methods.

3. For complex structures (complex geometry,heterogeneous materials, complex junctions,complex boundary conditions, several attachedequipments or mechanical subsystems, etc.), anintermediate frequency range, called the medium-frequency range (MF), appears. This MF rangedoes not exist for a simple structure (e.g., a simplysupported homogeneous straight beam). This MFrange is defined as the intermediate frequency range

Structural Dynamics 1425

S

HFLF MF

FRF

mod

ulus

(in

log

scal

e)

Frequency (Hz) LF

Unw

rapp

ed p

hase

Frequency (Hz)HFMF

Structural Dynamics, Fig. 1 Modulus (left) and unwrapped phase (right) of the FRF as a function of the frequency. Definition ofthe LF, MF, and HF ranges

for which the modal density exhibits large variationsover the frequency band. Due to the presenceof the damping which yields an overlapping ofelastic structural modes, the frequency responsefunctions do not exhibit isolated resonances, andthe phase slowly varies as a function of thefrequency (see Fig. 1). In this MF range, theresponses are sensitive to damping modeling (forweakly dissipative structure), which is frequencydependent, and sensitive to uncertainties. For thisMF range, the computational model is constructedas follows: The reduced-order computational modelof the LF range can be used in (i) adapting thefinite element discretization to the MF range,(ii) introducing appropriate damping models (dueto dissipation in the structure and to transfer ofmechanical energy from the structure to mechanicalsubsystems which are not taken into account inthe computational model), and (iii) introducinguncertainty quantification for both the system-parameter uncertainties and the model uncertaintiesinduced by the modeling errors.For sake of brevity, the case of nonlinear dynamical

responses of structures (involving nonlinear constitu-tive equations, nonlinear geometrical effects, plays,etc.) is not considered in this chapter (see Bibliograph-ical comments).

Formulation in the Low-Frequency Rangefor Complex Structures

We consider linear dynamics of a structure around aposition of static equilibrium taken as the referenceconfiguration, &, which is a three-dimensional

bounded connected domain of R3, with a smooth

boundary @& for which the external unit normalis denoted as n. The generic point of & is x D.x1; x2; x3/. Let u.x; t/ D .u1.x; t/; u2.x; t/; u3.x; t//be the displacement of a particle located at pointx in & and at a time t . The structure is assumedto be free (�0 D ;), a given surface force fieldG.x; t/ D .G1.x; t/; G2.x; t/; G3.x; t// is applied tothe total boundary � D @&, and a given body forcefield g.x; t/ D .g1.x; t/; g2.x; t/; g3.x; t// is appliedin &. It is assumed that these external forces are inequilibrium. Below, if w is any quantity depending onx, then w;j denotes the partial derivative of w withrespect to xj . The classical convention for summationsover repeated Latin indices is also used.

The elastodynamic boundary value problem is writ-ten, in terms of u and at time t , as

� @2t ui .x; t/ � �ij;j .x; t/ D gi .x; t/ in &; (1)

�ij .x; t/ nj .x/ D Gi.x; t/ on �; (2)

�ij;j .x;t/ D aijkh.x/ "kh.u/C bijkh.x/"kh.@tu/;

"kh.u/ D .uk;h C uh;k/=2: (3)

In (1), �.x/ is the mass density field, �ij is the Cauchystress tensor. The constitutive equation is defined by (3)exhibiting an elastic part defined by the tensor aijkh.x/and a dissipative part defined by the tensor bijkh.x/,independent of t because the model is developed forthe low-frequency range, and ".@tu/ is the linearizedstrain tensor.

Let C D .H1.&//3 be the real Hilbert spaceof the admissible displacement fields, x 7! v.x/,on &. Considering t as a parameter, the variationalformulation of the boundary value problem defined by


(1)–(3) consists, for fixed t , in finding u.:; t/ in C, suchthat

m.@2t u; v/Cd.@tu; v/C k.u; v/ D f .t I v/; 8v 2 C;(4)

in which the bilinear formm is symmetric and positivedefinite, the bilinear forms d and k are symmetric,positive semi-definite, and are such that

m.u; v/ DZ&

� uj vj dx;

k.u; v/ DZ&

aijkh "kh.u/ "ij .v/dx; (5)

d.u; v/ DZ&

bijkh "kh.u/ "ij .v/dx;

f .t I v/ DZ&

gj .t/ vj dx CZ�

Gj .t/ vj ds.x/: (6)

The kernel of the bilinear forms k and d is the set ofthe rigid body displacements, Crig � C of dimension 6.Any displacement field urig in Crig is such that, for allx in &, urig.x/ D t C � � x in which t and � are twoarbitrary constant vectors in R

3.For the evolution problem with given Cauchy initial

conditions u.:; 0/ D u0 and @tu.:; 0/ D v0, the analysisof the existence and uniqueness of a solution requiresthe introduction of the following hypotheses: � is apositive bounded function on &; for all x in &, thefourth-order tensor aijkh.x/ (resp. bijkh.x/) is symmet-ric, aijkh.x/ D aj ikh.x/ D aijhk.x/ D akhij .x/, andsuch that, for all second-order real symmetric tensor�ij , there is a positive constant c independent of x,such that aijkh.x/ �kh �ij � c �ij �ij I the functionsaijkh and bijkh are bounded on &; finally, g and G aresuch that the linear form v 7! f .t I v/ is continuouson C. Assuming that for all v in C, t 7! f .t I v/is a square integrable function on R. Let Cc be thecomplexified vector space of C and let v be the complexconjugate of v. Then, introducing the Fourier trans-forms u.x; !/ D R

Re�i!tu.x; t/ dt and f .!I v/ DR

Re�i!tf .t I v/ dt , the variational formulation defined

by (4) can be rewritten as follows: For all fixed real! 6D 0, find u.:; !/ with values in Cc such that

� !2 m.u; v/ C i! d.u; v/ C k.u; v/

D f .!I v/; 8v 2 Cc: (7)

The finite element discretization with n degrees offreedom of (4) yields the following second-order dif-ferential equation on R

n:

ŒM � RU.t/C ŒD� PU.t/C ŒK�U.t/ D F.t/; (8)

and its Fourier transform, which corresponds to thefinite element discretization of (7), yields the complexmatrix equation which is written as

.�!2 ŒM �C i! ŒD�C ŒK�/U.!/ D F.!/; (9)

in which ŒM � is the mass matrix which is a symmetricpositive definite .n� n/ real matrix and where ŒD� andŒK� are the damping and stiffness matrices which aresymmetric positive semi-definite .n� n/ real matrices.Case of a fixed structure. If the structure is fixed on apart �0 of boundary @& (Dirichlet condition u D 0 on�0), then the given surface force field G.x; t/ is appliedto the part � D @&n�0. The space C of the admissibledisplacement fields must be replaced by

C0 D fv 2 C; v D 0 on �0g: (10)

The complex vector space Cc must be replaced by thecomplex vector space Cc0 which is the complexifiedvector space of C0. The real matrices ŒD� and ŒK� arepositive definite.

Associated Spectral Problem andStructural Modes

Setting � D !2, the spectral problem, associated withthe variational formulation defined by (4) or (7), isstated as the following generalized eigenvalue prob-lem. Find real � � 0 and u 6D 0 in C such that

k.u; v/ D �m.u; v/; 8v 2 C: (11)

Rigid body modes (solutions for � D 0). Since thedimension of Crig is 6, then � D 0 can be consid-ered as a “zero eigenvalue” of multiplicity 6, denotedas ��5; : : : ; �0. Let u�5; : : : ;u0 be the correspondingeigenfunctions which are constructed such that thefollowing orthogonality conditions are satisfied: for˛ and ˇ in f�5; : : : ; 0g, m.u˛;uˇ/ D �˛ ı˛ˇ andk.u˛;uˇ/ D 0. These eigenfunctions, called the rigidbody modes, form a basis of Crig � C and any rigid

Structural Dynamics 1427

S

body displacement urig in Crig can then be expanded asurig D P0

˛D�5 q˛ u˛ .

Elastic structural modes (solutions for � 6D 0). Weintroduce the subset Celas D C n Crig. It can be shownthat C D Crig˚Celas which means that any displacementfield u in C has the following unique decompositionu D urig C uelas with urig in Crig and uelas in Celas.Consequently, k.uelas; velas/ defined on Celas � Celas ispositive definite and we then have k.uelas;uelas/ > 0

for all uelas 6D 0 2 Celas.

Eigenvalue problem restricted to Celas. The eigenvalueproblem restricted to Celas is written as follows: Find� 6D 0 and uelas 6D 0 in Celas such that

k.uelas; velas/ D �m.uelas; velas/; 8velas 2 Celas:

(12)

Countable number of positive eigenvalues. It can beproven that the eigenvalue problem, defined by (12),admits an increasing sequence of positive eigenvalues0 < �1 � �2 � : : : � �˛ � : : :. In addition, anymultiple positive eigenvalue has a finite multiplicity(which means that a multiple positive eigenvalue isrepeated a finite number of times).

Orthogonality of the eigenfunctions corresponding tothe positive eigenvalues. The sequence of eigenfunc-tions fu˛g˛ in Celas corresponding to the positive eigen-values satisfies the following orthogonality conditions:

m.u˛;uˇ/ D �˛ ı˛ˇ; k.u˛;uˇ/ D �˛ !2˛ ı˛ˇ; (13)

in which !˛ D p�˛ and where �˛ is a positive real

number depending on the normalization of eigenfunc-tion u˛ .

Completeness of the eigenfunctions corresponding tothe positive eigenvalues. Let u˛ be the eigenfunctionassociated with eigenvalue�˛ > 0. It can be shown thateigenfunctions fu˛g˛�1 form a complete family in Celas

and consequently, an arbitrary function uelas belongingto Celas can be expanded as uelas D PC1

˛D1 q˛ u˛ inwhich fq˛g˛ is a sequence of real numbers. Theseeigenfunctions are called the elastic structural modes.

Orthogonality between the elastic structural modesand the rigid body modes. We have k.u˛;urig/ D 0

and m.u˛;urig/ D 0. Substituting urig.x/ D t C � � xinto (13) yields

Z&

u˛.x/ �.x/ dx D 0;Z&

x � u˛.x/ �.x/ dx D 0;

(14)

which shows that the inertial center of the structure de-formed under the elastic structural mode u˛ , coincideswith the inertial center of the undeformed structure.

Expansion of the displacement field using the rigidbody modes and the elastic structural modes. Anydisplacement field u in C can then be written as u DP0

˛D�5 q˛ u˛ CPC1˛D1 q˛ u˛ .

Terminology. In structural vibrations, !˛ > 0 is calledthe eigenfrequency of elastic structural mode u˛ (orthe eigenmode or mode shape of vibration) whosenormalization is defined by the generalized mass �˛ .An elastic structural mode ˛ is defined by the threequantities f!˛;u˛; �˛g.

Finite element discretization. The matrix equation ofthe generalized symmetric eigenvalue problem corre-sponding to the finite element discretization of (11) iswritten as

ŒK �U D � ŒM �U: (15)

For large computational model, this generalized eigen-value problem is solved using iteration algorithms suchas the Krylov sequence, the Lanczos method, and thesubspace iteration method, which allows a prescribednumber of eigenvalues and associated eigenvectors tobe computed.Case of a fixed structure. If the structure is fixed on �0,Crig is reduced to the empty set and admissible spaceCelas must be replaced by C0. In this case the eigenval-ues are strictly positive. In addition, the property givenfor the free structure concerning the inertial center ofthe structure does not hold.

Reduced-Order Computational Model inthe Frequency Domain

In the frequency range, the reduced-order computa-tional model is carried out using the Ritz-Galerkinprojection. Let CS be the admissible function spacesuch that CS D Celas for a free structure and CS D C0for a structure fixed on �0. Let CS;N be the subspaceof CS , of dimension N � 1, spanned by the finitefamily fu1; : : : ;uN g of elastic structural modes u˛ .


For all fixed !, the projection uN .!/ of u.!/ on thecomplexified vector space of CS;N can be written as

uN .x; !/ DNX˛D1

q˛.!/ u˛.x/; (16)

in which q.!/ D .q1.!/; : : : ; qN .!// is the complex-valued vector of the generalized coordinates whichverifies the matrix equation on C

N ,

.�!2 ŒM �C i! ŒD�C ŒK �/ q.!/ D F.!/; (17)

in which ŒM �, ŒD�, and ŒK � are .N � N/ real sym-metric positive definite matrices (for a free or a fixedstructure). Matrices ŒM� and ŒK � are diagonal andsuch that

ŒM �˛ˇ D m.uˇ;u˛/ D �˛ ı˛ˇ;

ŒK �˛ˇ D k.uˇ;u˛/ D �˛ !2˛ı˛ˇ: (18)

The damping matrix ŒD� is not sparse (fully populated)and the componentF˛ of the complex-valued vector ofthe generalized forces F D .F1; : : : ;FN / are such that

ŒD�˛ˇ D d.uˇ;u˛/; F˛.!/ D f .!I u˛/: (19)

The reduced-order model is defined by (16)–(19).Convergence of the solution constructed with thereduced-order model. For all real !, (17) has a uniquesolution uN .!/ which is convergent in CS when Ngoes to infinity. Quasi-static correction terms can beintroduced to accelerate the convergence with respectto N .

Remarks concerning the diagonalization of the damp-ing operator. When damping operator is diagonalizedby the elastic structural modes, matrix ŒD� defined by(19), is an .N � N/ diagonal matrix which can bewritten as ŒD�˛ˇ D d.uˇ;u˛/ D 2�˛ !˛ ˛ ı˛ˇ inwhich �˛ and !˛ are defined by (18). The criticaldamping rate ˛ of elastic structural mode u˛ is apositive real number. A weakly damped structure is astructure such that 0 < ˛ � 1 for all ˛ in f1; : : : ; N g.Several algebraic expressions exist for diagonalizingthe damping bilinear form with the elastic structuralmodes.

Bibliographical Comments

The mathematical aspects related to the variationalformulation, existence and uniqueness, and finite el-ement discretization of boundary value problems forelastodynamics can be found in Dautray and Lions [6],Oden and Reddy [11], and Hughes [9]. More detailsconcerning the finite element method can be foundin Zienkiewicz and Taylor [14]. Concerning the timeintegration algorithms in nonlinear computational dy-namics, the readers are referred to Belytschko et al. [3],and Har and Tamma [8]. General mechanical formula-tions in computational structural dynamics, vibration,eigenvalue algorithms, and substructuring techniquescan be found in Argyris and Mlejnek [1], Geradinand Rixen [7], Bathe and Wilson [2], and Craig andBampton [5]. For computational structural dynamicsin the low- and the medium-frequency ranges andextensions to structural acoustics, we refer the readerto Ohayon and Soize [12]. Various formulations for thehigh-frequency range can be found in Lyon and Dejong[10] for Statistical Energy Analysis, and in Chap. 4 ofBensoussan et al. [4] for diffusion of energy and trans-port equations. Concerning uncertainty quantification(UQ) in computational structural dynamics, we referthe reader to Soize [13].

References

1. Argyris, J., Mlejnek, H.P.: Dynamics of Structures. North-Holland, Amsterdam (1991)

2. Bathe, K.J., Wilson, E.L.: Numerical Methods inFinite Element Analysis. Prentice-Hall, New York(1976)

3. Belytschko, T., Liu, W.K., Moran, B.: Nonlinear FiniteElements for Continua and Structures. Wiley, Chichester(2000)

4. Bensoussan, A., Lions, J.L., Papanicolaou, G.: AsymptoticAnalysis for Periodic Structures. AMS Chelsea Publishing,Providence (2010)

5. Craig, R.R., Bampton, M.C.C.: Coupling of substruc-tures for dynamic analysis. AIAA J. 6, 1313–1319(1968)

6. Dautray, R., Lions, J.L.: Mathematical Analysis and Numer-ical Methods for Science and Technology. Springer, Berlin(2000)

7. Geradin, M., Rixen, D.: Mechanical Vibrations: Theoryand Applications to Structural Dynamics, 2nd edn. Wiley,Chichester (1997)

8. Har, J., Tamma, K.: Advances in Computational Dynamicsof Particles, Materials and Structures. Wiley, Chichester(2012)

Subdivision Schemes 1429

S

9. Hughes, T.J.R.: The Finite Element Method: Linear Staticand Dynamic Finite Element Analysis. Dover, New York(2000)

10. Lyon, R.H., Dejong, R.G.: Theory and Application of Sta-tistical Energy Analysis, 2nd edn. Butterworth-Heinemann,Boston (1995)

11. Oden, J.T., Reddy, J.N.: An Introduction to the Math-ematical Theory of Finite Elements. Dover, New York(2011)

12. Ohayon, R., Soize, C.: Structural Acoustics and Vibration.Academic, London (1998)

13. Soize, C.: Stochastic Models of Uncertainties in Computa-tional Mechanics, vol. 2. Engineering Mechanics Institute(EMI) of the American Society of Civil Engineers (ASCE),Reston (2012)

14. Zienkiewicz, O.C., Taylor, R.L.: The Finite ElementMethod: The Basis, vol. 1, 5th edn. Butterworth- Heine-mann, Oxford (2000)

Subdivision Schemes

Nira DynSchool of Mathematical Sciences, Tel-AvivUniversity, Tel-Aviv, Israel


Primary: 65D07; 65D10; 65D17. Secondary: 41A15;41A25

Definition

A subdivision scheme is a method for generating acontinuous function from discrete data, by repeatedapplications of refinement rules. A refinement ruleoperates on a set of data points and generates a denserset using local mappings. The function generated bya convergent subdivision scheme is the limit of thesequence of sets of points generated by the repeatedrefinements.

Description

Subdivision schemes are efficient computational meth-ods for the design, representation, and approximationof curves and surfaces in 3D and for the generation

of refinable functions, which are instrumental in theconstruction of wavelets.

The “classical” subdivision schemes are stationaryand linear, applying the same linear refinement rule ateach refinement level. The theory of these schemes iswell developed and well understood; see [2, 7, 9, 15],and references therein.

Nonlinear schemes were designed for the approxi-mation of piecewise smooth functions (see, e.g., [1,4]),for taking into account the geometry of the initialpoints in the design of curves/surfaces (see, e.g., [5,8,11]), and for manifold-valued data (see, e.g., [12,14,16]). These schemes were studied at a later stage, andin many cases their analysis is based on their proximityto linear schemes.

Linear SchemesA linear refinement rule operates on a set of points inRd with topological relations among them, expressedby relating the points to the vertices of a regular grid inRs . In the design of 3D surfaces, d D 3 and s D 2.

The refinement consists of a rule for refining thegrid and a rule for defining the new points corre-sponding to the vertices of the refined grid. The mostcommon refinement is binary, and the most commongrid is Zs .

For a set of points P D fPi 2 Rd Wi 2 Zsg, related to 2�kZs , the binary refinementrule R generates points related to 2�k�1Zs , ofthe form

.RP/i DXj2Zs

ai�2j Pj ; i 2 Zs; (1)

with the point .RP/i related to the vertex i2�k�1. Theset of coefficients fai 2 R W i 2 Zsg is called the maskof the refinement rule, and only a finite number of thecoefficients are nonzero, reflecting the locality of therefinement.

When the same refinement rule is applied in allrefinement levels, the scheme is called stationary,while if different linear refinement rules are appliedin different refinement levels, the scheme is callednonstationary (see, e.g., [9]).

A stationary scheme is defined to be convergent (inthe L1-norm) if for any initial set of points P , there

1430 Subdivision Schemes

exists a continuous function F defined on .Rs/d suchthat

limk!1 sup

i2ZsjF.i2�k/� .RkP/i j D 0; (2)

and if for at least one initial set of points, F 6 0. Asimilar definition of convergence is used for all othertypes of subdivision schemes, with Rk replaced by theappropriate product of refinement rules.

Although the refinement (1) is defined on all Zs , thefinite support of the mask guarantees that the limit ofthe subdivision scheme at a point is affected only by afinite number of initial points.

When the initial points are samples of a smoothfunction, the limit function of a convergent linear sub-division scheme approximates the sampled function.Thus, a convergent linear subdivision scheme is a linearapproximation operator.

As examples, we give two prototypes of stationaryschemes for s D 1. Each is the simplest of its kind,converging to C1 univariate functions. The first is theChaikin scheme [3], called also “Corner cutting,” withlimit functions which “preserve the shape” of the initialsets of points. The refinement rule is

.RP/2i D 3

4Pi C 1

4PiC1;

.RP/2iC1 D 1

4Pi C 3

4PiC1; i 2 Z:

(3)

The second is the 4-point scheme [6, 10], which in-terpolates the initial set of points (and all the sets ofpoints generated by the scheme). It is defined by therefinement rule

.RP/2i D Pi ; .RP/2iC1 D 9

16.Pi C PiC1/

� 1

16.Pi�1 C PiC2/; i 2 Z:

(4)

While the limit functions of the Chaikin scheme canbe written in terms of B-splines of degree 2, the limitfunctions of the 4-point scheme for general initial setsof points have a fractal nature and are defined onlyprocedurally by (4).

For the design of surfaces in 3D, s D 2 and the com-mon grids are Z2 and regular triangulations. The latterare refined by dividing each triangle into four equalones. Yet regular grids (with each vertex belonging tofour squares in the case of Z2 and to six triangles inthe case of a regular triangulation) are not sufficient for

representing surfaces of general topology, and a finitenumber of extraordinary points are required [13].

The analysis on regular grids of the convergence ofa stationary linear scheme, and of the smoothness ofthe generated functions, is based on the coefficientsof the mask. It requires the computation of the jointspectral radius of several finite dimensional matriceswith mask coefficients as elements in specific positions(see, e.g., [2]) or an equivalent computation in termsof the Laurent polynomial a.z/ D P

i2Zs ai zi (see,e.g., [7]).

When dealing with the design of surfaces, thisanalysis applies only in all parts of the grids awayfrom the extraordinary points. The analysis at thesepoints is local [13], but rather involved. It also dictateschanges in the refinement rules that have to be madenear extraordinary points [13, 17].

References

1. Amat, S., Dadourian, K., Liandart, J.: On a nonlin-ear subdivision scheme avoiding Gibbs oscillations andconverging towards C. Math. Comput. 80, 959–971(2011)

2. Cavaretta, A.S., Dahmen, W., Micchelli, C.A.: StationarySubdivision. Memoirs of MAS, vol. 93, p. 186. AmericanMathematical Society, Providence (1991)

3. Chaikin, G.M.: An algorithm for high speed curve gen-eration. Comput. Graph. Image Process. 3, 346–349(1974)

4. Cohem, A., Dyn, N., Matei, B.: Quasilinear subdivisionschemes with applications to ENO interpolation. Appl.Comput. Harmonic Anal. 15, 89–116 (2003)

5. Deng, C., Wang, G.: Incenter subdivision scheme forcurve interpolation. Comput. Aided Geom. Des. 27, 48–59(2010)

6. Dubuc, S.: Interpolation through an iterative scheme.J. Math. Anal. Appl. 114, 185–204 (1986)

7. Dyn, N.: Subdivision schemes in computer-aided geometricdesign. In: Light, W. (ed.) Advances in Numerical Analysis– Volume II, Wavelets, Subdivision Algorithms and RadialBasis Functions, p. 36. Clarendon, Oxford (1992)

8. Dyn, N., Hormann, K.: Geometric conditions for tangentcontinuity of interpolatory planar subdivision curves. Com-put. Aided Geom. Des. 29, 332–347 (2012)

9. Dyn, N., Levin, D.: Subdivision schemes in geometricmodelling. Acta-Numer. 11, 73–144 (2002)

10. Dyn, N., Gregory, J., Levin, D.: A 4-point interpolatorysubdivision scheme for curve design. Comput. Aided Geom.Des. 4, 257–268 (1987)

11. Dyn, N., Floater, M.S., Hormann, K.: Four-point curvesubdivision based on iterated chordal and centripetalparametrizations. Comput. Aided Geom. Des. 26, 279–286(2009)

Symbolic Computing 1431

S

12. Grohs, P.: A general proximity analysis of nonlinear subdi-vision schemes. SIAM J. Math. Anal. 42, 729–750 (2010)

13. Peters, J., Reif, U.: Subdivision Surfaces, p. 204. Springer,Berlin/Heidelberg (2008)

14. Wallner, J., Navayazdani, E., Weinmann, A.: Convergenceand smoothness analysis of subdivision rules in Riemannianand symmetric spaces. Adv. Comput. Math. 34, 201–218(2011)

15. Warren, J., Weimer, H.: Subdivision Methods for Geo-metric Design, p. 299. Morgan Kaufmann, San Francisco(2002)

16. Xie, G., Yu, T.: Smoothness equivalence properties of gen-eral manifold-valued data subdivision schemes. MultiscaleModel. Simul. 7, 1073–1100 (2010)

17. Zorin, D.: Smoothness of stationary subdivision on ir-regular meshes. Constructive Approximation. 16, 359–397(2000)

Symbolic Computing

Ondrej Certık1, Mateusz Paprocki2, Aaron Meurer3,Brian Granger4, and Thilina Rathnayake51Los Alamos National Laboratory, Los Alamos, NM,USA2refptr.pl, Wroclaw, Poland3Department of Mathematics, New Mexico StateUniversity, Las Cruces, NM, USA4Department of Physics, California Polytechnic StateUniversity, San Luis Obispo, CA, USA5Department of Computer Science, University ofMoratuwa, Moratuwa, Sri Lanka

Synonyms

Computer algebra system; Symbolic computing; Sym-bolic manipulation

Keywords

Symbolic computing; Computer algebra system

Glossary/Definition Terms

Numerical computing Computing that is based onfinite precision arithmetic.

Symbolic computing Computing that uses symbolsto manipulate and solve mathematical formulas andequations in order to obtain mathematically exactresults.

Definition

Scientific computing can be generally divided into twosubfields: numerical computation, which is based onfinite precision arithmetic (usually single or doubleprecision), and symbolic computing which uses sym-bols to manipulate and solve mathematical equationsand formulas in order to obtain mathematically exactresults.

Symbolic computing, also called symbolic manip-ulation or computer algebra system (CAS), typicallyincludes systems that can focus on well-defined com-puting areas such as polynomials, matrices, abstractalgebra, number theory, or statistics (symbolic), aswell as on calculus-like manipulation such as limits,differential equations, or integrals. A full-featured CASshould have all or most of the listed features. Thereare systems that focus only on one specific area likepolynomials – those are often called CAS too.

Some authors claim that symbolic computing andcomputer algebra are two views of computing withmathematical objects [1]. According to them, symboliccomputation deals with expression trees and addressesproblems of determination of expression equivalence,simplification, and computation of canonical forms,while computer algebra is more centered around thecomputation of mathematical quantities in well-definedalgebraic domains. The distinction between symboliccomputing and computer algebra is often not made;the terms are used interchangeably. We will do so inthis entry as well.

History

Algorithms for Computer Algebra [2] provides a con-cise description about the history of symbolic compu-tation. The invention of LISP in the early 1960s had agreat impact on the development of symbolic computa-tion. FORTRAN and ALGOL which existed at the timewere primarily designed for numerical computation. In1961, James Slagle at MIT (Massachusetts Institute ofTechnology) wrote a heuristics-based LISP program

1432 Symbolic Computing

for Symbolic Automatic INTegration (SAINT) [3]. In1963, Martinus J. G. Veltman developed Schoonschip[4, 5] for particle physics calculations. In 1966, JoelMoses (also from MIT) wrote a program called SIN[6] for the same purpose as SAINT, but he used a moreefficient algorithmic approach. In 1968, REDUCE [7]was developed by Tony Hearn at Stanford Universityfor physics calculations. Also in 1968, a specializedCAS called CAMAL [8] for handling Poisson seriesin celestial mechanics was developed by John Fitchand David Barton from the University of Cambridge.In 1970, a general purposed system called REDUCE 2was introduced.

In 1971 Macsyma [9] was developed with capa-bilities for algebraic manipulation, limit calculations,symbolic integration, and the solution of equations.In the late 1970s muMATH [10] was developed bythe University of Hawaii and it came with its ownprogramming language. It was the first CAS to runon widely available IBM PC computers. With thedevelopment of computing in 1980s, more modernCASes began to emerge. Maple [11] was introducedby the University of Waterloo with a small compiledkernel and a large mathematical library, thus allowingit to be used powerfully on smaller platforms. In 1988,Mathematica [12] was developed by Stephen Wolframwith better graphical capabilities and integration withgraphical user interfaces. In the 1980s more and moreCASes were developed like Macaulay [13], PARI [14],GAP [15], and CAYLEY [16] (which later becameMagma [17]). With the popularization of open-sourcesoftware in the past decade, many open-source CASeswere developed like Sage [18], SymPy [19], etc. Also,many of the existing CASes were later open sourced;

for example, Macsyma became Maxima [20]; Scratch-pad [21] became Axiom [22].

Overview

A common functionality of all computer algebrasystems typically includes at least the featuresmentioned in the following subsections. We use SymPy0.7.5 and Mathematica 9 as examples of doing thesame operation in two different systems, but any otherfull-featured CAS can be used as well (e.g., fromthe Table 1) and it should produce the same resultsfunctionally.

To run the SymPy examples in a Python session,execute the following first:

from sympy import *x, y, z, n, m = symbols(’x, y, z,

n, m’)f = Function(’f’)

To run the Mathematica examples, just execute them ina Mathematica Notebook.

Arbitrary Formula RepresentationOne can represent arbitrary expressions (not just poly-nomials). SymPy:

In [1]: (1+1/x)**xOut[1]: (1 + 1/x)**xIn [2]: sqrt(sin(x))/zOut[2]: sqrt(sin(x))/z

Mathematica:

In[1]:= (1+1/x)ˆxOut[1]= (1+1/x)ˆx

Symbolic Computing, Table 1 Implementation details of various computer algebra systems

Program License Internal implementation language CAS language

Mathematica [12] Commercial C/C++ CustomMaple [11] Commercial C/C++ customSymbolic MATLAB toolbox [23] Commercial C/C++ CustomAxioma [22] BSD Lisp CustomSymPy [19] BSD Python PythonMaxima [20] GPL Lisp CustomSage [18] GPL C++/Cython/Lisp Pythonb

Giac/Xcas [24] GPL C++ CustomaThe same applies to its two forks FriCAS [25] and OpenAxiom [26]bThe default environment in Sage actually extends the Python language using a preparser that converts things like 2ˆ3 intoInteger(2)**Integer(3), but the preparser can be turned off and one can use Sage from a regular Python session as well


S

In[2]:= Sqrt[Sin[x]]/zOut[2]= Sqrt[Sin[x]]/z

LimitsSymPy:

In [1]: limit(sin(x)/x, x, 0)Out[1]: 1In [2]: limit((2-sqrt(x))/(4-x),x,4)Out[2]: 1/4

Mathematica:

In[1]:= Limit[Sin[x]/x,x->0]Out[1]= 1In[2]:= Limit[(2-Sqrt[x])/(4-x),x->4]Out[2]= 1/4

DifferentiationSymPy:

In [1]: diff(sin(2*x), x)Out[1]: 2*cos(2*x)In [1]: diff(sin(2*x), x, 10)Out[1]: -1024*sin(2*x)

Mathematica:

In[1]:= D[Sin[2 x],x]Out[1]= 2 Cos[2 x]In[2]:= D[Sin[2 x],{x,10}]Out[2]= -1024 Sin[2 x]

IntegrationSymPy:

In [1]: integrate(1/(x**2+1), x)Out[1]: atan(x)In [1]: integrate(1/(x**2+3), x)Out[1]: sqrt(3)*atan(sqrt(3)*x/3)/3

Mathematica:

In[1]:= Integrate[1/(xˆ2+1),x]Out[1]= ArcTan[x]In[2]:= Integrate[1/(xˆ2+3),x]Out[2]= ArcTan[x/Sqrt[3]]/Sqrt[3]

Polynomial FactorizationSymPy:

In [1]: factor(x**2*y + x**2*z+ x*y**2 + 2*x*y*z+ x*z**2 + y**2*z+ y*z**2)

Out[1]: (x + y)*(x + z)*(y + z)

Mathematica:

In[1]:= Factor[xˆ2 y+xˆ2 z+xyˆ2+2 x y z+xzˆ2+yˆ2 z+y zˆ2]

Out[1]= (x+y) (x+z) (y+z)

Algebraic and Differential Equation SolversAlgebraic equations, SymPy:

In [1]: solve(x**4+x**2+1, x)Out[1]: [-1/2 - sqrt(3)*I/2, -1/2

+ sqrt(3)*I/2,1/2 - sqrt(3)*I/2, 1/2

+ sqrt(3)*I/2]

Mathematica:

In[1]:= Reduce[1+xˆ2+xˆ4==0,x]Out[1]= x==-(-1)ˆ(1/3)||x

==(-1)ˆ(1/3)||x==-(-1)ˆ(2/3)||x==(-1)ˆ(2/3)

and differential equations, SymPy:

In [1]: dsolve(f(x).diff(x, 2)+f(x), f(x))

Out[1]: f(x) == C1*sin(x)+ C2*cos(x)

In [1]: dsolve(f(x).diff(x, 2)+9*f(x), f(x))

Out[1]: f(x) == C1*sin(3*x)+ C2*cos(3*x)

Mathematica:

In[1]:= DSolve[f’’[x]+f[x]==0,f[x],x]Out[1]= {{f[x]->C[1] Cos[x]

+C[2] Sin[x]}}In[2]:= DSolve[f’’[x]+9 f[x]

==0,f[x],x]Out[2]= {{f[x]->C[1] Cos[3 x]

+C[2] Sin[3 x]}}

Formula SimplificationSimplification is not a well-defined operation (i.e.,there are many ways how to define the complexity of anexpression), but typically the CAS is able to simplify,for example, the following expressions in an expectedway, SymPy:

In [1]: simplify(-1/(2*(x**2 + 1))- 1/(4*(x + 1))+1/(4*(x - 1))- 1/(x**4-1))


Out[1]: 0

Mathematica:

In[1]:= Simplify[-1/(2(xˆ2+1))-1/(4(x+1))+1/(4(x-1))-1/(xˆ4-1)]

Out[1]= 0

or, SymPy:

In [1]: simplify((x - 1)/(x**2 - 1))Out[1]: 1/(x + 1)

Mathematica:

In[1]:= Simplify[(x-1)/(xˆ2-1)]Out[1]= 1/(1+x)

Numerical EvaluationExact expressions like

p2, constants, sums, integrals,

and symbolic expressions can be evaluated to a desiredaccuracy using a CAS. For example, in SymPy:

In [1]: N(sqrt(2),30)Out[1]: 1.4142135623730950488016-

8872421In [2]: N(Sum(1/n**n, (n,1,oo)),30)Out[2]: 1.2912859970626635404072-

8259060

Mathematica:

In[1]:= N[Sqrt[2],30]Out[1]= 1.4142135623730950488016-

8872421In[2]:= N[Sum[1/nˆn,{n,1,

Infinity}],30]Out[2]= 1.2912859970626635404072-

8259060

Symbolic SummationThere are circumstances where it is mathematicallyimpossible to get an explicit formula for a given sum.When an explicit formula exists, getting the exact resultis usually desirable. SymPy:

In [1]: Sum(n, (n, 1, m)).doit()Out[1]: m**2/2 + m/2In [2]: Sum(1/n**6,(n,1,oo)).doit()Out[2]: pi**6/945In [3]: Sum(1/n**5,(n,1,oo)).doit()Out[3]: zeta(5)

Mathematica:

In[1]:= Sum[n,{n,1,m}]Out[1]= 1/2 m (1+m)In[2]:= Sum[1/nˆ6,{n,1,Infinity}]Out[2]= Piˆ6/945In[3]:= Sum[1/nˆ5,{n,1,Infinity}]Out[3]= Zeta[5]

Software

A computer algebra system (CAS) is typically com-posed of a high-level (usually interpreted) languagethat the user interacts with in order to perform cal-culations. Many times the implementation of such aCAS is a mix of the high-level language together withsome low-level language (like C or C++) for efficiencyreasons. Some of them can easily be used as a libraryin user’s programs; others can only be used from thecustom CAS language.

A comprehensive list of computer algebra softwareis at [27]. Table 1 lists features of several establishedcomputer algebra systems. We have only included sys-tems that can handle at least the problems mentionedin the Overview section.

Besides general full-featured CASes, there existspecialized packages, for Example, Singular [28] forvery fast polynomial manipulation or GiNaC [29] thatcan handle basic symbolic manipulation but does nothave integration, advanced polynomial algorithms, orlimits. Pari [14] is designed for number theory compu-tations and Cadabra [30] for field theory calculationswith tensors. Magma [17] specializes in algebra, num-ber theory, algebraic geometry, and algebraic combina-torics.

Finally, a CAS also usually contains a notebooklike interface, which can be used to enter commandsor programs, plot graphs, and show nicely format-ted equations. For Python-based CASes, one can useIPython Notebook [31] or Sage Notebook [18], both ofwhich are interactive web applications that can be usedfrom a web browser. C++ CASes can be wrapped inPython, for example, GiNaC has several Python wrap-pers: Swiginac [32], Pynac [33], etc. These can then beused from Python-based notebooks. Mathematica andMaple also contain a notebook interface, which acceptsthe given CAS high-level language.


S

Applications of Symbolic Computing

Symbolic computing has traditionally had numerousapplications. By 1970s, many CASes were used for ce-lestial mechanics, general relativity, quantum electro-dynamics, and other applications [34]. In this sectionwe present a few such applications in more detail, butnecessarily our list is incomplete and is only meant asa starting point for the reader.

Many of the following applications and scientificadvances related to them would not be possible withoutsymbolic computing.

Code GenerationOne of the frequent use of a CAS is to derive somesymbolic expression and then generate C or Fortrancode that numerically evaluates it in a production high-performance code. For example, to obtain the bestrational function approximation (of orders 8, 8) toa modified Bessel function of the first kind of half-integer argument I9=2.x/ on an interval Œ4; 10�, one canuse (in Mathematica):

In[1]:= Needs["FunctionApproximations‘"]In[2]:= FortranForm[HornerForm[MiniMaxApproximation[

BesselI[9/2, x]*Sqrt[Pi*x/2]/Exp[x],{x,{4,10},8,8},WorkingPrecision->30][[2,1]]]]

Out[2]//FortranForm=(0.000395502959013236968661582656143 +

x*(-0.001434648369704841686633794071 +x*(0.00248783474583503473135143644434 +x*(-0.00274477921388295929464613063609 +x*(0.00216275018107657273725589740499 +x*(-0.000236779926184242197820134964535 +x*(0.0000882030507076791807159699814428 +

(-4.62078105288798755556136693122e-6 +8.23671374777791529292655504214e-7*x)*x)))))))

/(1. + x*(0.504839286873735708062045336271 +

x*(0.176683950009401712892997268723 +x*(0.0438594911840609324095487447279 +x*(0.00829753062428409331123592322788 +x*(0.00111693697900468156881720995034 +x*(0.000174719963536517752971223459247 +

(7.22885338737473776714257581233e-6 +1.64737453771748367647332279826e-6*x)*x)))))))

The result can be readily used in a Fortran code (wereformatted the white space in the output Out[2] tobetter fit into the page).

Particle PhysicsThe application of symbolic computing in particlephysics typically involves generation and then calcu-lation of Feynman diagrams (among other things thatinvolves doing fast traces of Dirac gamma matricesand other tensor operations). The first CAS that was

designed for this task was Schoonschip [4, 5], and in1984 FORM [35] was created as a successor. FORMhas built-in features for manipulating formulas in parti-cle physics, but it can also be used as a general purposesystem (it keeps all expressions in expanded form, soit cannot do factorization; it also does not have moreadvanced features like series expansion, differentialequations, or integration).

Many of the scientific results in particle physicswould not be possible without a powerful CAS;Schoonschip was used for calculating properties of the


W boson in the 1960s and FORM is still maintainedand used to this day.

Another project is FeynCalc [36], originally writtenfor Macsyma (Maxima) and later Mathematica. It is apackage for algebraic calculations in elementary par-ticle physics, among other things; it can do tensor andDirac algebra manipulation, Lorentz index contraction,and generation of Feynman rules from a Lagrangian,Fortran code generation. There are hundreds of publi-cations that used FeynCalc to perform calculations.

Similar project is FeynArts [37], which is also aMathematica package that can generate and visualizeFeynman diagrams and amplitudes. Those can then becalculated with a related project FormCalc [38], builton top of FORM.

PyDyPyDy, short for Python Dynamics, is a work flowthat utilizes an array of scientific tools written in thePython programming language to study multi-bodydynamics [39]. SymPy mechanics package is used togenerate symbolic equations of motion in complexmulti-body systems, and several other scientific Pythonpackages are used for numerical evaluation (NumPy[40]), visualization (Matplotlib [41]), etc. First, an ide-alized version of the system is represented (geometry,configuration, constraints, external forces). Then thesymbolic equations of motion (often very long) aregenerated using the mechanics package and solved(integrated after setting numerical values for the pa-rameters) using differential equation solvers in SciPy[42]. These solutions can then be used for simulationsand visualizations. Symbolic equation generation guar-antees no mistakes in the calculations and makes it easyto deal with complex systems with a large number ofcomponents.

General RelativityIn general relativity the CASes have traditionally beenused to symbolically represent the metric tensor g��

and then use symbolic derivation to derive varioustensors (Riemann and Ricci tensor, curvature, : : :) thatare present in the Einstein’s equations [34]:

R�� 1

2g�� RC g�� D 8�G

c4T�� : (1)

Those can then be solved for simple systems.

Celestial MechanicsThe equations in celestial mechanics are solved usingperturbation theory, which requires very efficient ma-nipulation of a Poisson series [8, 34, 43–50]:

XP.a; b; c; : : : ; h/

sin

cos.�u C �v C � � � C �z/ (2)

where P.a; b; c; : : : ; h/ is a polynomial and each termcontains either sin or cos. Using trigonometric rela-tions, it can be shown that this form is closed to ad-dition, subtraction, multiplication, differentiation, andrestricted integration. One of the earliest specializedCASes for handling Poisson series is CAMAL [8].Many others were since developed, for example, TRIP[43].

QuantumMechanicsQuantum mechanics is well known for many tediouscalculations, and the use of a CAS can aid in doingthem. There have been several books published withmany worked-out problems in quantum mechanicsdone using Mathematica and other CASes [51, 52].

There are specialized packages for doing computa-tions in quantum mechanics, for example, SymPy hasextensive capabilities for symbolic quantum mechanicsin the sympy.physics.quantum subpackage. Atthe base level, this subpackage has Python objects torepresent the different mathematical objects relevant inquantum theory [53]: states (bras and kets), operators(unitary, Hermitian, etc.), and basis sets as well asoperations on these objects such as tensor products,inner products, outer products, commutators, anticom-mutators, etc. The base objects are designed in themost general way possible to enable any particularquantum system to be implemented by subclassing thebase operators to provide system specific logic. Thereis a general purpose qapply function that is capableof applying operators to states symbolically as wellas simplifying a wide range of symbolic expressionsinvolving different types of products and commuta-tor/anticommutators. The state and operator objectsalso have a rich API for declaring their representationin a particular basis. This includes the ability to specifya basis for a multidimensional system using a completeset of commuting Hermitian operators.

On top of this base set of objects, a number ofspecific quantum systems have been implemented.First, there is traditional algebra for quantum angular


S

momentum [54]. This allows the different spinoperators (Sx, Sy , Sz) and their eigenstates to berepresented in any basis and for any spin quantumnumber. Facilities for Clebsch-Gordan coefficients,Wigner coefficients, rotations, and angular momentumcoupling are also present in their symbolic andnumerical forms. Other examples of particularquantum systems that are implemented includesecond quantization, the simple harmonic oscillator(position/momentum and raising/lowering forms), andcontinuous position/momentum-based systems.

Second there is a full set of states and operatorsfor symbolic quantum computing [55]. Multidimen-sional qubit states can be represented symbolicallyand as vectors. A full set of one (X , Y , Z, H , etc.)and two qubit (CNOT, SWAP, CPHASE, etc.) gates(unitary operators) are provided. These can be repre-sented as matrices (sparse or dense) or made to act onqubits symbolically without representation. With thesegates, it is possible to implement a number of basicquantum circuits including the quantum Fourier trans-form, quantum error correction, quantum teleportation,Grover’s algorithm, dense coding, etc.

There are other packages that specialize in quantumcomputing, for example, [56].

Number TheoryNumber theory provides an important base for moderncomputing, especially in cryptography and coding the-ory [57]. For example, LLL [58] algorithm is used ininteger programming; primality testing and factoringalgorithms are used in cryptography [59]. CASes areheavily used in these calculations.

Riemann hypothesis [60, 61] which implies resultsabout the distribution of prime numbers has importantapplications in computational mathematics since it canbe used to estimate how long certain algorithms take torun [61]. Riemann hypothesis states that all nontrivialzeros of the Riemann zeta function, defined for com-plex variable s defined in the half-plane < .s/ > 1 bythe absolutely convergent series � .s/ D P1

nD1 n�s ,have real part equal to 1

2. In 1986, this was proven

for the first 1,500,000,001 nontrivial zeros using com-putational methods [62]. Sebastian Wedeniwski usingZettaGrid (a distributed computing project to find rootsof the zeta function) verified the result for the first 400billion zeros in 2005 [63].

Teaching Calculus and Other ClassesComputer algebra systems are extremely useful forteaching calculus [64] as well as other classes wheretedious symbolic algebra is needed, such as manyphysics classes (general relativity, quantum mechanicsand field theory, symbolic solutions to partial dif-ferential equations, e.g., in electromagnetism, fluids,plasmas, electrical circuits, etc.) [65].

Experimental MathematicsOne field that would not be possible at all without com-puter algebra systems is called “experimental mathe-matics” [66], where CASes and related tools are usedto “experimentally” verify or suggest mathematicalrelations. For example, the famous Bailey–Borwein–Plouffe (BBP) formula

� D1XkD0

�1

16k

�4

8k C 1� 2

8k C 4

� 1

8k C 5� 1

8k C 6

�(3)

was first discovered experimentally (using arbitrary-precision arithmetic and extensive searching using aninteger relation algorithm), only then proved rigorously[67].

Another example is in [68] where the authors firstnumerically discovered and then proved that for ratio-nal x, y, the 2D Poisson potential function satisfies

.x; y/ D 1

�2

Xa;b odd

cos.a�x/ cos.b�y/

a2 C b2D 1

�log˛

(4)

where ˛ is algebraic (a root of an integer polynomial).

Acknowledgements This work was performed under the aus-pices of the US Department of Energy by Los Alamos NationalLaboratory.

References

1. Watt, S.M.: Making computer algebra more symbolic. In:Dumas J.-G. (ed.) Proceedings of Transgressive Computing2006: A Conference in Honor of Jean Della Dora, Facultadde Ciencias, Universidad de Granada, pp 43–49, April 2006

2. Geddes, K.O., Czapor, S.R., Labahn, G.: Algorithms forcomputer algebra. In: Introduction to Computer Algebra.Kluwer Academic, Boston (1992)


3. Slagle, J.R.: A heuristic program that solves symbolic in-tegration problems in freshman calculus. J. ACM 10(4),507–520 (1963)

4. Veltman, M.J.G.: From weak interactions to gravitation. Int.J. Mod. Phys. A 15(29), 4557–4573 (2000)

5. Strubbe, H.: Manual for SCHOONSCHIP a CDC6000/7000 program for symbolic evaluation of algebraicexpressions. Comput. Phys. Commun. 8(1), 1–30(1974)

6. Moses, J.: Symbolic integration. PhD thesis, MIT (1967)7. REDUCE: A portable general-purpose computer algebra

system. http://reduce-algebra.sourceforge.net/ (2013)8. Fitch, J.: CAMAL 40 years on – is small still beautiful?

Intell. Comput. Math., Lect. Notes Comput. Sci. 5625,32–44 (2009)

9. Moses, J.: Macsyma: a personal history. J. Symb. Comput.47(2), 123–130 (2012)

10. Shochat, D.D.: Experience with the musimp/mumath-80symbolic mathematics system. ACM SIGSAM Bull. 16(3),16–23 (1982)

11. Bernardin, L., Chin, P. DeMarco, P., Geddes, K.O., Hare,D.E.G., Heal, K.M., Labahn, G., May, J.P., McCarron,J., Monagan, M.B., Ohashi, D., Vorkoetter, S.M.: MapleProgramming Guide. Maplesoft, Waterloo (2011)

12. Wolfram, S.: The Mathematica Book. Wolfram ResearchInc., Champaign (2000)

13. Grayson, D.R., Stillman, M.E.: Macaulay 2, a softwaresystem for research in algebraic geometry. Available athttp://www.math.uiuc.edu/Macaulay2/

14. The PARI Group, Bordeaux: PARI/GP version 2.7.0 .Available from http://pari.math.u-bordeaux.fr/ (2014)

15. The GAP Group: GAP – groups, algorithms, and program-ming, version 4.4. http://www.gap-system.org (2003)

16. Cannon, J.J.: An introduction to the group theory languagecayley. Comput. Group Theory 145, 183 (1984)

17. Bosma, W., Cannon, J., Playoust, C.: The Magma algebrasystem. I. The user language. J. Symb. Comput. 24(3–4), 235–265 (1997). Computational Algebra and NumberTheory. London (1993)

18. Stein, W.A., et al.: Sage Mathematics Software (Version6.4.1). The Sage Development Team (2014). http://www.sagemath.org

19. SymPy Development Team: SymPy: Python library forsymbolic mathematics. http://www.sympy.org (2013)

20. Maxima: Maxima, a computer algebra system. Version5.34.1. http://maxima.sourceforge.net/ (2014)

21. Jenks, R.D., Sutor, R.S., Watt, S.M.: Scratchpad ii: anabstract datatype system for mathematical computation. In:Janßen, R. (ed.) Trends in Computer Algebra. Volume 296of Lecture Notes in Computer Science, pp. 12–37. Springer,Berlin/Heidelberg (1988)

22. Jenks, R.D., Sutor, R.S.: Axiom – The Scientific Computa-tion System. Springer, New York/Berlin etc. (1992)

23. MATLAB: Symbolic math toolbox. http://www.mathworks.com/products/symbolic/ (2014)

24. Parisse, B.: Giac/xcas, a free computer algebra system.Technical report, Technical report, University of Grenoble(2008)

25. FriCAS, an advanced computer algebra system: http://fricas.sourceforge.net/ (2015)

26. OpenAxiom, The open scientific computation platform:http://www.open-axiom.org/ (2015)

27. Wikipedia: List of computer algebra systems. http://en.wikipedia.org/wiki/List of computer algebra systems(2013)

28. Decker, W., Greuel, G.-M., Pfister, G., Schonemann, H.:SINGULAR 3-1-6 – a computer algebra system for polyno-mial computations. http://www.singular.uni-kl.de (2012)

29. Bauer, C., Frink, A., Kreckel, R.: Introduction to the ginacframework for symbolic computation within the c++ pro-gramming language. J. Symb. Comput. 33(1), 1–12 (2002)

30. Peeters, K.: Introducing cadabra: a symbolic computer alge-bra system for field theory problems. arxiv:hep-th/0701238;A field-theory motivated approach to symbolic com-puter algebra. Comput. Phys. Commun. 176, 550 (2007).[arXiv:cs/0608005]. - 14

31. Perez, F., Granger, B.E.: IPython: a system for interactivescientific computing. Comput. Sci. Eng. 9(3), 21–29 (2007)

32. Swiginac, Python interface to GiNaC: http://sourceforge.net/projects/swiginac.berlios/ (2015)

33. Pynac, derivative of GiNaC with Python wrappers: http://pynac.org/ (2015)

34. Barton, D., Fitch, J.P.: Applications of algebraic manipula-tion programs in physics. Rep. Prog. Phys. 35(1), 235–314(1972)

35. Vermaseren, J.A.M.: New features of FORM. Math. Phys.e-prints (2000). ArXiv:ph/0010025

36. Mertig, R., Bohm, M., Denner, A.: Feyn calc – computer-algebraic calculation of feynman amplitudes. Comput. Phys.Commun. 64(3), 345–359 (1991)

37. Hahn, T.: Generating feynman diagrams and amplitudeswith feynarts 3. Comput. Phys. Commun. 140(3), 418–431(2001)

38. Hahn, T., Perez-Victoria, M.: Automated one-loop calcula-tions in four and d dimensions. Comput. Phys. Commun.118(2–3), 153–165 (1999)

39. Gede, G., Peterson, D.L., Nanjangud, A.S., Moore,J.K., Hubbard, M.: Constrained multibody dynamics withpython: from symbolic equation generation to publication.In: ASME 2013 International Design Engineering Tech-nical Conferences and Computers and Information in En-gineering Conference, pp. V07BT10A051–V07BT10A051.American Society of Mechanical Engineers, San Diego(2013)

40. NumPy: The fundamental package for scientific computingin Python. http://www.numpy.org/ (2015)

41. Hunter, J.D.: Matplotlib: a 2d graphics environment. Com-put. Sci. Eng. 9(3), 90–95 (2007)

42. Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: opensource scientific tools for Python (2001–). http://www.scipy.org/

43. Gastineau, M., Laskar, J.: Development of TRIP: fast sparsemultivariate polynomial multiplication using burst tries. In:Computational Science – ICCS 2006, University of Read-ing, pp. 446–453 (2006)

44. Biscani, F.: Parallel sparse polynomial multiplication onmodern hardware architectures. In: Proceedings of the 37thInternational Symposium on Symbolic and Algebraic Com-putation – ISSAC ’12, (1), Grenoble, pp. 83–90 (2012)

45. Shelus, P.J., Jefferys III, W.H.: A note on an attempt at moreefficient Poisson series evaluation. Celest. Mech. 11(1), 75–78 (1975)

46. Jefferys, W.H.: A FORTRAN-based list processor for Pois-son series. Celest. Mech. 2(4), 474–480 (1970)

http://reduce-algebra.sourceforge.net/

http://www.math.uiuc.edu/Macaulay2/

http://pari.math.u-bordeaux.fr/

http://www.gap-system.org

http://www.sagemath.org

http://www.sagemath.org

http://www.sympy.org

http://maxima.sourceforge.net/

http://www.mathworks.com/products/symbolic/

http://www.mathworks.com/products/symbolic/

http://fricas.sourceforge.net/

http://fricas.sourceforge.net/

http://www.open-axiom.org/

http://en.wikipedia.org/wiki/List_of_computer_algebra_systems

http://en.wikipedia.org/wiki/List_of_computer_algebra_systems

http://www.singular.uni-kl.de

http://sourceforge.net/projects/swiginac.berlios/

http://sourceforge.net/projects/swiginac.berlios/

http://pynac.org/

http://pynac.org/

http://www.numpy.org/

http://www.scipy.org/

http://www.scipy.org/

Symmetric Methods 1439

S

47. Broucke, R., Garthwaite, K.: A programming system foranalytical series expansions on a computer. Celest. Mech.1(2), 271–284 (1969)

48. Fateman, R.J.: On the multiplication of poisson series.Celest. Mech. 10(2), 243–247 (1974)

49. Danby, J.M.A., Deprit, A., Rom, A.R.M.: The symbolic ma-nipulation of poisson series. In: SYMSAC ’66 Proceedingsof the First ACM Symposium on Symbolic and AlgebraicManipulation, New York, pp. 0901–0934 (1965)

50. Bourne, S.R.: Literal expressions for the co-ordinates of themoon. Celest. Mech. 6(2), 167–186 (1972)

51. Feagin, J.M.: Quantum Methods with Mathematica.Springer, New York (2002)

52. Steeb, W.-H., Hardy, Y.: Quantum Mechanics Using Com-puter Algebra: Includes Sample Programs in C++, Symbol-icC++, Maxima, Maple, and Mathematica. World Scientific,Singapore (2010)

53. Sakurai, J.J., Napolitano, J.J.: Modern Quantum Mechanics.Addison-Wesley, Boston (2010)

54. Zare, R.N.: Angular Momentum: Understanding SpatialAspects in Chemistry and Physics. Wiley, New York (1991)

55. Nielsen, M.A., Chuang, I.L.: Quantum Computation andQuantum Information. Cambridge University Press, Cam-bridge (2011)

56. Gerdt, V.P., Kragler, R., Prokopenya, A.N.: A MathematicaPackage for Simulation of Quantum Computation. LectureNotes in Computer Science (including subseries LectureNotes in Artificial Intelligence and Lecture Notes in Bioin-formatics), 5743 LNCS, pp. 106–117. Springer, Berlin/Hei-delberg (2009)

57. Shoup, V.: A Computational Introduction to Number The-ory and Algebra. Cambridge University Press, Cambridge(2009)

58. Lenstra, A.K., Lenstra, H.W., Lovasz, L.: Factoring poly-nomials with rational coefficients. Mathematische Annalen261(4), 515–534 (1982)

59. Cohen, H.: A Course in Computational Algebraic NumberTheory, vol. 138. Springer, Berlin/New York (1993)

60. Titchmarsh, E.C.: The Theory of the Riemann Zeta-Function, vol. 196. Oxford University Press, Oxford(1951)

61. Mazur, B., Stein, W.: Prime Numbers and the RiemannHypothesis. Cambridge University Press, Cambridge (2015)

62. van de Lune, J., Te Riele, H.J.J., Winter, D.T.: On the zerosof the riemann zeta function in the critical strip. iv. Math.Comput. 46(174), 667–681 (1986)

63. Wells, D.: Prime Numbers: The Most Mysterious Figures inMath. Wiley, Hoboken (2011)

64. Palmiter, J.R.: Effects of computer algebra systems onconcept and skill acquisition in calculus. J. Res. Math. Educ.22, 151–156 (1991)

65. Bing, T.J., Redish, E.F.: Symbolic manipulators affect math-ematical mindsets. Am. J. Phys. 76(4), 418–424 (2008)

66. Bailey, D.H., Borwein, J.M.: Experimental mathematics:examples, methods and implications. Not. AMS 52, 502–514 (2005)

67. Bailey, D.H., Borwein, P., Plouffe, S.: On the rapid compu-tation of various polylogarithmic constants. Math. Comput.66(218), 903–913 (1996)

68. Bailey, D.H., Borwein, J.M.: Lattice sums arising fromthe Poisson equation. J. Phys. A: Math. Theor. 3, 1–18(2013)

SymmetricMethods

Philippe ChartierINRIA-ENS Cachan, Rennes, France

Synonyms

Time reversible

Definition

This entry is concerned with symmetric methods forsolving ordinary differential equations (ODEs) of theform

Py D f .y/ 2 Rn; y.0/ D y0: (1)

Throughout this article, we denote by 't;f .y0/ the flowof equation (1) with vector field f , i.e., the exactsolution at time t with initial condition y.0/ D y0,and we assume that the conditions for its well defi-niteness and smoothness for .y0; jt j/ in an appropriatesubset˝ of Rn �RC are satisfied. Numerical methodsfor (1) implement numerical flows ˚h;f which, forsmall enough stepsizes h, approximate 'h;f . Of centralimportance in the context of symmetric methods is theconcept of adjoint method.

Definition 1 The adjoint method˚�h;f is the inverse of

˚t;f with reversed time step �h:

˚�h;f WD ˚�1�h;f (2)

A numerical method ˚h is then said to be symmetric if˚h;f D ˚�

h;f .

Overview

Symmetry is an essential property of numerical meth-ods with regard to the order of accuracy and geometricproperties of the solution. We briefly discuss the im-plications of these two aspects and refer to the corre-sponding sections for a more involved presentation:• A method ˚h;f is said to be of order p if

˚h;f .y/ D 'h;f .y/C O.hpC1/;

1440 Symmetric Methods

f

ρ ρ

Rn

Rn

Rn

Rn

(−f)

't;f

't;f

ρ ρ

Rn

Rn

Rn

Rn

−1

©h;f

©h;f

ρ ρ

Rn

Rn

Rn

Rn

−1

Symmetric Methods, Fig. 1 �-reversibility of f , 't;f and˚h;f

and, if the local error has the following first-termexpansion

˚h;f .y/ D 'h;f .y/C hpC1C.y/C O.hpC2/;

then straightforward application of the implicit func-tion theorem leads to

˚�h;f .y/ D 'h;f .y/ � .�h/pC1C.y/C O.hpC2/:

This implies that a symmetric method is necessarilyof even order p D 2q, since ˚h;f .y/ D ˚�

h;f .y/

means that .1 C .�1/pC1/C.y/ D 0. This propertyplays a key role in the construction of compositionmethods by triple jump techniques (see section on“Symmetric Methods Obtained by Composition”), andthis is certainly no coincidence that Runge-Kutta meth-ods of optimal order (Gauss methods) are symmetric(see section on “Symmetric Methods of Runge-KuttaType”). It also explains why symmetric methods areused in conjunction with (Richardson) extrapolationtechniques.• The exact flow 't;f is itself symmetric owing to

the group property 'sCt;f D 's;f ı 't;f . Considernow an isomorphism � of the vector space R

n (thephase space of (1)) and assume that the vector fieldf satisfies the relation � ıf D �f ı � (see Fig. 1).Then, 't;f is said to be �-reversible, that is it to saythe following equality holds:

� ı 't;f D '�1t;f ı � (3)

Example 1 Hamiltonian systems

Py D @H

@z.y; z/

Pz D �@[email protected]; z/

with a Hamiltonian function H.q; p/ satisfyingH.y;�z/ D H.y; z/ are �-reversible for �.y; z/ D.y;�z/.

Definition 2 A method ˚h, applied to a �-reversibleordinary differential equation, is said to be �-reversibleif

� ı ˚h;f D ˚�1h;f ı �:

Note that if ˚h;f is symmetric, it is �-reversible if andonly if the following condition holds:

� ı ˚h;f D ˚�h;f ı �: (4)

Besides, if (4) holds for an invertible �, then ˚h;f is�-reversible if and only if it is symmetric.

Example 2 The trapezoidal rule, whose flow is definedby the implicit equation

˚h;f .y/ D y C hf

�1

2y C 1

2˚h;f .y/

�; (5)

is symmetric and is �-reversible when applied to�-reversible f .

Since most numerical methods satisfy relation (4),symmetry is the required property for numerical meth-ods to share with the exact flow not only time re-versibility but also �-reversibility. This illustrates that asymmetric method mimics geometric properties ofthe exact flow. Modified differential equations sustainfurther this assertion (see next section) and allow forthe derivation of deeper results for integrable reversiblesystems such as the preservation of invariants andthe linear growth of errors by symmetric meth-ods (see section on “Reversible Kolmogorov-Arnold–Moser Theory”).

Modified Equations for SymmetricMethods

Constant stepsize backward error analysis. Con-sidering a numerical method ˚h (not necessarily sym-metric) and the sequence of approximations obtainedby application of the formula ynC1 D ˚h;f .yn/; n D0; 1; 2; : : :, from the initial value y0, the idea of back-ward error analysis consists in searching for a modifiedvector field f N

h such that

'h;f Nh.y0/ D ˚h;f .y0/C O.hNC2/; (6)


S

where the modified vector field, uniquely defined by aTaylor expansion of (6), is of the form

f Nh .y/ D f .y/Chf1.y/Ch2f2.y/C : : :ChNfN .y/:

(7)

Theorem 1 The modified vector field of a symmetricmethod ˚h;f has an expansion in even powers of h,i.e., f2jC1 0 for j D 0; 1; : : : Moreover, if f and˚h;f are �-reversible, then f N

h is �-reversible as wellfor any N � 0.

Proof. Reversing the time step h in (6) and taking theinverse of both sides, we obtain

.'�h;f N�h/�1.y0/ D .˚�h;f /�1.y0/C O.hNC2/:

Now, the group property of exact flows implies that.'�h;f N

�h/�1.y0/ D 'h;f N

�h.y0/, so that

'h;f N�h.y0/ D ˚�

h;f .y0/C O.hNC2/;

and by uniqueness, .f Nh /

� D f N�h. This proves thefirst statement. Assume now that f is �-reversible, sothat (4) holds. It follows from f N�h D f N

h that

� ı '�h;f Nh D � ı '�h;f N�h

O.hNC2/D � ı ˚�h;f

D ˚h;f ı � O.hNC2/D 'h;f Nhı �;

where the second and last equalities are valid up toO.hNC2/-error terms. Yet the group property thenimplies that � ı '�nh;f Nh D 'nh;f Nh

ı � C On.hNC2/

where the constant in the On-term depends on n andan interpolation argument shows that for fixed N andsmall jt j

� ı '�t;f Nh D 't;f Nhı �C O.hNC1/;

where the O-term depends smoothly on t and on N .Finally, differentiating with respect to t , we obtain

�� ı f Nh D d

dt� ı '�t;f Nh

ˇˇtD0

D d

dt't;f Nh

ı �ˇˇtD0

CO.hNC2/ D f Nh ı � C O.hNC1/;

and consequently �� ı f Nh D f N

h ı �. ut

Remark 1 The expansion (7) of the modified vectorfield f N

h can be computed explicitly at any order Nwith the substitution product of B-series [2].

Example 3 Consider the Lotka-Volterra equations inPoisson form

� PuPv�

D�

0 uv�uv 0

��ruH.u; v/rvH.u; v/

�;

H.u; v/ D log.u/C log.v/ � u � v;

i.e., y0 D f .y/ with f .y/ D .u.1 � v/; v.u � 1//T .Note that � ı f D �f ı � with �.u; v/ D .v; u/.The modified vector fields f 2

h;iE for the implicit Eulermethod and f 2

h;mr for the implicit midpoint rule read(with N D 2)

f 2h;iE D f C 1

2hf 0f C h2

12f 00.f; f /C h2

3f 0f 0f

and f 2h;mr D f � h2

24f 00.f; f /C h2

12f 0f 0f:

The exact solutions of the modified ODEs are plottedon Fig. 2 together with the corresponding numericalsolution. Though the modified vector fields are trun-cated only at second order, the agreement is excellent.The difference of the behavior of the two solutions isalso striking: only the symmetric method captures theperiodic nature of the solution. (The good behavior ofthe midpoint rule cannot be attributed to its symplectic-ity since the system is a noncanonical Poisson system.)This will be further explored in the next section.

Variable stepsize backward error analysis. In prac-tice, it is often fruitful to resort to variable stepsizeimplementations of the numerical flow ˚h;f . In ac-cordance with [17], we consider stepsizes that areproportional to a function �s.y; �/ depending only onthe current state y and of a parameter � prescribedby the user and aimed at controlling the error. Theapproximate solution is then given by

ynC1 D ˚�s.yn;�/;f .yn/; n D 0; : : : :

A remarkable feature of this algorithm is that it pre-serves the symmetry of the exact solution as soon as˚h;f is symmetric and s satisfies the relation

s.˚�s.y;�/;f .y/;��/ D s.y; �/


0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

u

v

Implicit Euler

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3Implicit midpoint rule

u

v

Symmetric Methods, Fig. 2 Exact solutions of modified equations (red lines) versus numerical solutions by implicit Euler andmidpoint rule (blue points)

and preserves the �-reversibility as soon as ˚h;f is�-reversible and satisfies the relation

s.��1 ı ˚�s.y;�/;f .y/;��/ D s.y; �/:

A result similar to Theorem 1 then holds with h

replaced by �.

Remark 2 A recipe to construct such a function s,suggested by Stoffer in [17], consists in requiring thatthe local error estimate is kept constantly equal to atolerance parameter. For the details of the implementa-tion, we refer to the original paper or to Chap. VIII.3of [10].

Reversible Kolmogorov-Arnold-MoserTheory

The theory of integrable Hamiltonian systems has itscounterpart for reversible integrable ones. A reversiblesystem

Py D f .y; z/; Pz D g.y; z/ where � ı .f; g/

D �.f; g/ ı � with �.y; z/ D .y;�z/; (8)

is reversible integrable if it can be brought, through areversible transformation .a; �/ D .I.y; z/;‚.y; z//,to the canonical equations

Pa D 0; P� D !.a/:

An interesting instance is the case of completely inte-grable Hamiltonian systems:

Py D @H

@z.y; z/; Pz D �@H

@y.y; z/;

with first integrals Ij ’s in involution (That is to saysuch that .ryIi / � .rzIj / D .rzIi / � .ryIj / for all i; j .)such that Ij ı � D Ij . In the conditions where Arnold-Liouville theorem (see Chap. X.1.3. of [10]) can beapplied, then, under the additional assumption that

9.y�; 0/ 2 f.y; z/;8j; Ij .y; z/ D Ij .y0; z0/g; (9)

such a system is reversible integrable. In this situation,�-reversible methods constitute a very interesting wayaround symplectic method, as the following resultshows:

Theorem 2 Let ˚h;.f;g/ be a reversible numericalmethod of order p applied to an integrable reversiblesystem (8) with real-analytic f and g. Considera D .I1.y

; z/; : : : ; Id .y; z//: If the condition

8k 2 Zd =f0g; jk � !.a/j � �

dXiD1

jki j!��

is satisfied for some positive constants � and �, thenthere exist positive C , c, and h0 such that the followingassertion holds:


S

8h � h0;8.x0; y0/ such that maxjD1;:::;d jIj .y0; z0/� aj � cj log hj��1; (10)

8t D nh � h�p;(

k˚nh;.f;g/.x0; y0/ � .y.t/; z.t//k � C thp

jIj .˚nh;.f;g/.y0; z0// � Ij .y0; z0/j � Chp for all j:

Analogously to symplectic methods, �-reversiblemethods thus preserve invariant tori Ij D cst overlong intervals of times, and the error growth is linearin t . Remarkably and in contrast with symplecticmethods, this result remains valid for reversiblevariable stepsize implementations (see Chap. X.I.3of [10]). However, it is important to note that for aHamiltonian reversible system, the Hamiltonian ceasesto be preserved when condition (9) is not fulfilled. Thissituation is illustrated on Fig. 3 for the Hamiltoniansystem with H.q; p/ D 1

2p2 C cos.q/C 1

5sin.2q/, an

example borrowed from [4].

Symmetric Methods of Runge-Kutta Type

Runge-Kutta methods form a popular class of numer-ical integrators for (1). Owing to their importancein applications, we consider general systems (1) andsubsequently partitioned systems.

Methods for general systems. We start with thefollowing:

Definition 3 Consider a matrix A D .ai;j / 2 Rs � R

s

and a vector b D .bj / 2 Rs . The Runge-Kutta method

denoted .A; b/ is defined by

Yi D y C h

sXjD1

ai;j f .Yj /; i D 1; : : : ; s (11)

Qy D y C h

sXjD1

bj f .Yj /: (12)

Note that strictly speaking, the method is properly de-fined only for small jhj. In this case, the correspondingnumerical flow ˚h;f maps y to Qy. Vector Yi approxi-mates the solution at intermediate point t0Ccih, whereci D P

j ai;j , and it is customary since [1] to representa method by its tableau:

c1 a1;1 : : : a1;s:::

::::::

cs as;1 : : : as;sb1 : : : bs

(13)

Runge-Kutta methods automatically satisfy the�-compatibility condition (4): changing h into �hin (11) and (12), we have indeed by linearity of �and by using � ı f D �f ı �

�.Yi/ D �.y/� h

sXjD1

ai;j f�.Yj /

�; i D 1; : : : ; s

�. Qy/ D �.y/� h

sXjD1

bj f�.Yj /

�:

By construction, this is �.˚�h;f .y// and by previousdefinition ˚h;f .�.y//. As a consequence, �-reversibleRunge-Kutta methods coincide with symmetric meth-ods. Nevertheless, symmetry requires an additionalalgebraic condition stated in the next theorem:

Theorem 3 A Runge-Kutta method .A; b/ is symmet-ric if

PACAP D ebT and b D Pb; (14)

where e D .1; : : : ; 1/T 2 Rs and P is the permutation

matrix defined by pi;j D ıi;sC1�j .

Proof. Denoting Y DY T1 ; : : : ; Y

Ts

�Tand F.Y / D

f .Y1/T ; : : : ; f .Ys/

T�T

, a more compact form

for (11) and (12) is

Y D e ˝ y C h.A˝ I /F.Y /; (15)

Qy D y C h.bT ˝ I /F.Y /: (16)


q

p

−10 −5 0 5 10 15

−3

−2

−1

0

1

2

3

4

0 2000 4000 6000 8000 100001.0685

1.069

1.0695

1.07

1.0705

1.071

Time

Ham

ilton

ian

q0=0 and p0=2.034q0=0 and p0=2.033

Symmetric Methods, Fig. 3 Level sets of H (left) and evolution of H w.r.t. time for two different initial values

On the one hand, premultiplying (15) by P ˝ I andnoticing that

.P ˝ I /F.Y / D F.P ˝ I /Y

�;

it is straightforward to see that ˚h;f can also bedefined by coefficients PAPT and Pb. On the otherhand, exchanging h and �h, y, and Qy, it appears that˚�h;f is defined by coefficients A� D ebT � A and

b� D b. The flow ˚h;f is thus symmetric as soon asebT � A D PAP and b D Pb, which is nothing butcondition (14). ut

Remark 3 For methods without redundant stages, con-dition (14) is also necessary.

Example 4 The implicit midpoint rule, defined byA D12

and b D 1, is a symmetric method of order 2. Moregenerally, the s-stage Gauss collocation method basedon the roots of the sth shifted Legendre polynomial isa symmetric method of order 2s. For instance, the 2-stage and 3-stage Gauss methods of orders 4 and 6 havethe following coefficients:

12

�p36

14

14

�p36

12

Cp36

14

Cp36

14

12

12

12

�p1510

536

29

�p1515

536

�p1530

12

536

Cp1524

29

536

�p1524

12

Cp1510

536

Cp1530

29

Cp1515

536

518

49

518

(17)

Methods for partitioned systems. For systems of theform

Py D f .z/; Pz D g.y/; (18)

it is natural to apply two different Runge-Kutta meth-ods to variables y and z: Written in compact form, apartitioned Runge-Kutta method reads:

Y D e ˝ y C h.A˝ I /F.Z/;

Z D e ˝ y C h. OA˝ I /G.Y /;

Qy D y C h.bT ˝ I /F.Z/;

Qz D y C h. ObT ˝ I /G.Y /;

and the method is symmetric if both .A; b/ and . OA; Ob/are. An important feature of partitioned Runge-Kuttamethod is that they can be symmetric and explicit forsystems of the form (18).

Example 5 The Verlet method is defined by the fol-lowing two Runge-Kutta tableaux:


S

0 0 0

1 12

12

12

12

and

12

12

0

12

12

0

12

12

(19)

The method becomes explicit owing to the specialstructure of the partitioned system:

Y1 D y0; Z1 D z0 C h2f .Y1/;

Y2 D y0 C hg.Z1/; Z2 D Z1;

y1 D Y2; z1 D z0C h2

f .Y1/Cf .Y2/

�

The Verlet method is the most elementary method ofthe class of partitioned Runge-Kutta methods knownas Lobatto IIIA-IIIB. Unfortunately, methods of higherorders within this class are no longer explicit in gen-eral, even for the equations of the form (18). It isnevertheless possible to construct symmetric explicitRunge-Kutta methods, which turn out to be equivalentto compositions of Verlet’s method and whose intro-duction is for this reason postponed to the next section.

Note that a particular instance of partitioned sys-tems are second-order differential equations of theform

Py D z; Pz D g.y/; (20)

which covers many situations of practical interest (forinstance, mechanical systems governed by Newton’slaw in absence of friction).

Symmetric Methods Obtained byComposition

Another class of symmetric methods is constituted ofsymmetric compositions of low-order methods. Theidea consists in applying a basic method ˚h;f with asequence of prescribed stepsizes: Given s real numbers�1; : : : ; �s , its composition with stepsizes �1h; : : : ; �shgives rise to a new method:

‰h;f D ˚�sh;f ı : : : ı ˚�1h;f : (21)

Noticing that the local error of ‰h;f , defined by‰h;f .y/� 'h;f .y/, is of the form

.�pC11 C : : :C �pC1

s /hpC1C.y/C O.hpC2/;

as soon as �1 C : : : C �s D 1, ‰h;f is of order at leastp C 1 if

�pC11 C : : :C �pC1

s D 0:

This observation is the key to triple jump compositions,as proposed by a series of authors [3,5,18,21]: Startingfrom a symmetric method ˚h;f of (even) order 2q, thenew method obtained for

�1 D �3 D 1

2 � 21=.2qC1/ and

�2 D 21=.2qC1/

2� 21=.2qC1/

is symmetric

‰�h;f D ˚�

�1h;fı ˚�

�2h;fı ˚�

�3h;f

D ˚�3h;f ı ˚�2h;f ı ˚�1h;f D ‰h;f

and of order at least 2q C 1. Since the order of asymmetric method is even, ‰h;f is in fact of order2qC2. The procedure can then be repeated recursivelyto construct arbitrarily high-order symmetric methodsof orders 2qC2, 2qC4, 2qC6, : : :., with respectively3, 9, 27, : : :, compositions of the original basic method˚h;f . However, the construction is far from being themost efficient, for the combined coefficients becomelarge, some of which being negatives. A partial remedyis to envisage compositions with s D 5. We hereby givethe coefficients obtained by Suzuki [18]:

�1 D �2 D �4 D �5 D 1

4 � 41=.2qC1/ and

�3 D � 41=.2qC1/

4 � 41=.2qC1/

which give rise to very efficient methods for q D 1

and q D 2. The most efficient high-order compositionmethods are nevertheless obtained by solving the fullsystem of order conditions, i.e., by raising the order di-rectly from 2 to 8, for instance, without going throughthe intermediate steps described above. This requiresmuch more effort though, first to derive the orderconditions and then to solve the resulting polynomialsystem. It is out of the scope of this article to describethe two steps involved, and we rather refer to the paper[15] on the use of 1B-series for order conditionsand to Chap. V.3.2. of [10] for various examples andnumerical comparisons. An excellent method of order6 with 9 stages has been obtained by Kahan and Li [12]and we reproduce here its coefficients:


10510−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Number of Verlet steps

Err

or a

t tim

e t=

100

Kahan and LiTriple jump

�1 D �9 D 0:3921614440073141;

�2 D �8 D 0:3325991367893594;

�3 D �7 D �0:7062461725576393;�4 D �6 D 0:0822135962935508;

�5 D 0:7985439909348299:

For the sake of illustration, we have computed thesolution of Kepler’s equations with this method andthe method of order 6 obtained by the triple jumptechnique. In both cases, the basic method is Verlet’sscheme. The gain offered by the method of Kahan andLi is impressive (it amounts to two digits of accuracyon this example). Other methods can be found, forinstance, in [10, 14].

Remark 4 It is also possible to consider symmetriccompositions of nonsymmetric methods. In this situa-tion, raising the order necessitates to compose the basicmethod and its adjoint.

Symmetric Methods for Highly OscillatoryProblems

In this section, we present methods aimed at solvingproblems of the form

Rq D �rVfast .q/ � rVslow.q/ (22)

where Vfast and Vslow are two potentials acting ondifferent time scales, typically such that r2Vfast ispositive semi-definite and kr2Vfastk >> kr2Vslowk.Explicit standard methods suffer from severe stabilityrestrictions due to the presence of high oscillationsat the slow time scale and necessitate small stepsand many evaluations of the forces. Since slow forces�rVslow are in many applications much more expen-sive to evaluate than fast ones, efficient methods in thiscontext are thus devised to require significantly fewerevaluations per step of the slow force.

Example 6 In applications to molecular dynamics, forinstance, fast forces deriving from Vfast (short-rangeinteractions) are much cheaper to evaluate than slowforces deriving from Vslow (long-range interactions).Other examples of applications are presented in [11].

Methods for general problems with nonlinear fastpotentials. Introducing the variable p D Pq in (22),the equation reads

� PqPp�

„ƒ‚…Py

D�p

0

�

„ƒ‚…fK.y/

C�

0

�rqVfast .q/

�

„ ƒ‚ …fF .y/


S

C�

0

�rqVslow.q/

�

„ ƒ‚ …fS .y/

:

The usual Verlet method [20] would consist in compos-ing the flows 'h;.fF CfS / and 'h;fK as follows:

'h2 ;.fF CfS / ı 'h;fK ı 'h

2 ;.fF CfS /

or, if necessary, numerical approximations thereof andwould typically be restricted to very small stepsizes.The impulse method [6,8,19] combines the three piecesof the vector field differently:

'h2 ;fS

ı 'h;.fKCfF / ı 'h2 ;fS

:

Note that 'h;fS is explicit

'h;fS

�q

p

�D�

q

p � hrqVslow.q/

�

while 'h;.fKCfF / may require to be approximated by anumerical method˚h;.fKCfF / which uses stepsizes thatare fractions of h. If ˚h;.fKCfF / is symmetric (and/orsymplectic), the overall method is symmetric as well

(and/or symplectic) and allows for larger stepsizes.However, it still suffers from resonances and a betteroption is given by the mollified impulse methods,which considers the mollified potential NVslow.q/ DVslow.a.q// in loco of Vslow.q/, where a.q/ and a0.q/are averaged values given by

a.q/ D 1

h

Z h

0

x.s/ds; a0.q/ D 1

h

Z h

0

X.s/ds

where

Rx D �rVfast .x/; x.0/ D q; Px.0/ D p;

RX D �r2Vfast .x/X;X.0/ D I; PX.0/ D 0: (23)

The resulting method uses the mollified force�a0.q/T .rqVslow/.a.q// and is still symmetric (and/orsymplectic) provided (23) is solved with a symmetric(and/or symplectic) method.

Methods for problems with quadratic fast poten-tials. In many applications of practical importance,the potential Vfast is quadratic of the form Vfast .q/ D12qT˝2q. In this case, the mollified impulse method

falls into the class of trigonometric symmetric methodsof the form

ˆh

�p

q

�D R.h˝/

�p

q

�� 1

2h

0@ 0.h˝/rVslow

�.h˝/q0

�C 1.h˝/rVslow

�.h˝/q1

�

h .h˝/rVslow

�.h˝/q0

�1A

where R.h˝/ is the block matrix given by

R.h˝/ D�

cos.h˝/ �˝ sin.h˝/˝�1 sin.h˝/ cos.h˝/

�

and the functions �, , 0 and 1 are even functionssuch that

.z/ D sin.z/

z 1.z/; 0.z/ D cos.z/ 1.z/; and

.0/ D �.0/ D 1:

Various choices of functions and � are possibleand documented in the literature. Two particularly

interesting ones are .z/ D sin2.z/z2

, �.z/ D 1 (see [9])

or .z/ D sin3.z/z3

, �.z/ D sin.z/z (see [7]).

Conclusion

This entry should be regarded as an introduction to thesubject of symmetric methods. Several topics have notbeen exposed here, such as symmetric projection forODEs on manifolds, DAEs of index 1 or 2, symmet-ric multistep methods, symmetric splitting methods,and symmetric Lie-group methods, and we refer the

1448 Symmetries and FFT

interested reader to [10, 13, 16] for a comprehensivepresentation of these topics.

References

1. Butcher, J.C.: Implicit Runge-Kutta processes. Math.Comput. 18, 50–64 (1964)

2. Chartier, P., Hairer, E., Vilmart, G.: Algebraic structures ofB-series. Found. Comput. Math. 10(4), 407–427 (2010)

3. Creutz, M., Gocksch, A.: Higher-order hybrid Monte Carloalgorithms. Phys. Rev. Lett. 63, 9–12 (1989)

4. Faou, E., Hairer, E., Pham, T.L.: Energy conservation withnon-symplectic methods: examples and counter-examples.BIT 44(4), 699–709 (2004)

5. Forest, E.: Canonical integrators as tracking codes. AIPConf. Proc. 184, 1106–1136 (1989)

6. Garcıa-Archilla, B., Sanz-Serna, J.M., Skeel, R.D.: Long-time-step methods for oscillatory differential equations.SIAM J. Sci. Comput. 20, 930–963 (1999)

7. Grimm, V., Hochbruck, M.: Error analysis of exponential in-tegrators for oscillatory second-order differential equations.J. Phys. A 39, 5495–5507 (2006)

8. Grubmuller, H., Heller, H., Windemuth, A., Tavan, P.: Gen-eralized Verlet algorithm for efficient molecular dynamicssimulations with long-range interactions. Mol. Sim. 6, 121–142 (1991)

9. Hairer, E., Lubich, C.: Long-time energy conservation ofnumerical methods for oscillatory differential equations.SIAM J. Numer. Anal. 38, 414–441 (2000)

10. Hairer, E., Lubich, C., Wanner, G.: Geometric NumericalIntegration: Structure-Preserving Algorithms for OrdinaryDifferential Equations, 2nd edn. Springer Series in Compu-tational Mathematics, vol. 31. Springer, Berlin (2006)

11. Jia, Z., Leimkuhler, B.: Geometric integrators for multipletime-scale simulation. J. Phys. A 39, 5379–5403 (2006)

12. Kahan, W., Li, R.C.: Composition constants for raising theorders of unconventional schemes for ordinary differentialequations. Math. Comput. 66, 1089–1099 (1997)

13. Leimkuhler, B., Reich, S.: Simulating Hamiltonian Dynam-ics. Cambridge Monographs on Applied and ComputationalMathematics, vol. 14. Cambridge University Press, Cam-bridge (2004)

14. McLachlan, R.I., Quispel, G.R.W.: Splitting methods. ActaNumer. 11, 341–434 (2002)

15. Murua, A., Sanz-Serna, J.M.: Order conditions for numer-ical integrators obtained by composing simpler integrators.Philos. Trans. R. Soc. Lond. A 357, 1079–1100 (1999)

16. Sanz-Serna, J.M., Calvo, M.P.: Numerical HamiltonianProblems. Chapman & Hall, London (1994)

17. Stoffer, D.: On reversible and canonical integration meth-ods. Technical Report SAM-Report No. 88-05, ETH-Zurich(1988)

18. Suzuki, M.: Fractal decomposition of exponential operatorswith applications to many-body theories and monte carlosimulations. Phys. Lett. A 146, 319–323 (1990)

19. Tuckerman, M., Berne, B.J., Martyna, G.J.: Reversible mul-tiple time scale molecular dynamics. J. Chem. Phys. 97,1990–2001 (1992)

20. Verlet, L.: Computer “experiments” on classical fluids. i.Thermodynamical properties of Lennard-Jones molecules.Phys. Rev. 159, 98–103 (1967)

21. Yoshida, H.: Construction of higher order symplectic inte-grators. Phys. Lett. A 150, 262–268 (1990)

Symmetries and FFT

Hans Z. Munthe-KaasDepartment of Mathematics, University of Bergen,Bergen, Norway

Synonyms

Fourier transform; Group theory; Representation the-ory; Symmetries

Synopsis

The fast Fourier transform (FFT), group theory, andsymmetry of linear operators are mathematical topicswhich all connect through group representation theory.This entry provides a brief introduction, with emphasison computational applications.

Symmetric FFTs

The finite Fourier transform maps functions on a pe-riodic lattice to a dual Fourier domain. Formally, letZn D Zn1 � Zn2 � � � � � Zn` be a finite abelian(commutative) group, where Znj is the cyclic groupZnj D f0; 1; : : : ; nj �1g with group operation C.modnj / and n D .n1; : : : ; n`/ is a multi-index withjnj D n1n2 � � �n`. Let CZn denote the linear spaceof complex-valued functions on Zn. The primal do-main Zn is an `-dimensional lattice periodic in alldirections, and the Fourier domain is bZn D Zn (thisis the Pontryagin dual group [12]). For infinite abeliangroups, the primal and Fourier domains in general dif-fer, such as Fourier series on the circle where bR=Z DZ. The discrete Fourier transform (DFT) is an (up toscaling) unitary map F WCZn ! CbZn. Letting F.f / bf , we have

Symmetries and FFT 1449

S

bf .k/ DXj2Zn

f .j/e2�ik1j1n1

C��C k`j`n`

�;

for k D .k1; : : : ; k`/ 2 bZn, (1)

f .j/ D 1

jnjXk2bZn

bf .k/e�2�ik1j1n1

C��C k`j`n`

�;

for j D .j1; : : : ; j`/ 2 Zn. (2)

A symmetry for a function f 2 CZn is an R-linearmap S WCZn ! CZn such that Sf D f . As examples,consider real symmetry SRf .j/ D f .j/, even sym-metry Sef .j/ D f .�j/ and odd symmetry Sof .j/ D�f .�j/. If f has a symmetry S , then bf has an adjointsymmetry bS D FSF�1, example bSe D Se, bSo D So

and cSRbf .k/ D bf .�k/. The set of all symmetriesof f forms a group, i.e., the set of symmetries isclosed under composition and inversion. Equivalently,the symmetries can be specified by defining an abstractgroup G and a map RWG ! LinR.CZn/, which foreach g 2 G defines an R-linear map R.g/ on CZn,such that R.gg0/ D R.g/R.g0/ for all g; g0 2 G. R isan example of a real representation of G.

The DFT on Zn is computed by the fast Fouriertransform (FFT), costing O.jnj log.jnj// floating pointoperations. It is possible to exploit may symmetries inthe computation of the FFT, and for large classes ofsymmetry groups, savings a factor jGj can be obtainedcompared to the nonsymmetric FFT.

Equivariant Linear Operators and the GFT

RepresentationsLet G be a finite group with jGj elements. A dR-dimensional unitary representation of G is a mapRWG ! U.dR/ such that R.gh/ D R.g/R.h/ forall g; h 2 G, where U.dR/ is the set of dR � dRunitary matrices. More generally, a representation isa linear action of a group on a vector space. Tworepresentations R and eR are equivalent if there existsa matrix X such that eR.g/ D XR.g/X�1 for allg 2 G. A representation R is reducible if it is equiv-alent to a block diagonal representation; otherwise itis irreducible. For any finite group G, there exists acomplete list of nonequivalent irreducible representa-tions R D f�1; �2; : : : ; �ng, henceforth called irreps,

such thatP

�2R d2� D jGj. For example, the cyclicgroup Zn D f0; 1; : : : ; r � 1g with group operationC.modn/ has exactly n 1-dimensional irreps given as�k.j / D exp.2�ikj=n/. A matrix A is equivariantwith respect to a representation R of a group G ifAR.g/ D R.g/A for all g 2 G. Any representationR can be block diagonalized, with irreducible repre-sentations on the diagonal. This provides a change ofbasis matrix F such that FAF �1 is block diagonalfor any R-equivariant A. This result underlies most ofcomputational Fourier analysis and will be exemplifiedby convolutional operators.

Convolutions in the Group AlgebraThe group algebra CG is the complex jGj-dimensionalvector space where the elements ofG are the basis vec-tors; equivalently CG consists of all complex-valuedfunctions on G. The product in G extends linearly tothe convolution product WCG � CG ! CG, givenas .a b/.g/ D P

h2G a.h/b.h�1g/ for all g 2 G.The right regular representation of G on CG is, forevery h 2 G, a linear map R.h/WCG ! CG givenas right translation R.h/a.g/ D a.gh/. A linear mapAWCG ! CG is convolutional (i.e., there exists ana 2 CG such that Ab D a b for all b 2 CG) if andonly if A is equivariant with respect to the right regularrepresentation.

The Generalized Fourier Transform (GFT)The generalized Fourier transform [6, 10] and theinverse are given as

ba.�/ DXg2G

a.g/�.g/ 2 Cd��d� ; for all � 2 R (3)

a.g/ D 1

jGjX�2R

d�trace��.g�1/ba.�/� ; for all g 2 G.

(4)

From the convolution formula 1a b.�/ D ba.�/bb.�/,we conclude: The GFT block-diagonalizes convolu-tional operators on CG. The blocks are of size d�, thedimensions of the irreps.

Equivariant Linear OperatorsMore generally, consider a linear operator AWV ! Vwhere V is a finite-dimensional vector space and A isequivariant with respect to a linear right action of G

1450 Symmetries and FFT

on V . If the action is free and transitive, then A isconvolutional on CG. If the action is not transitive,then V splits in s separate orbits under the action ofG, and A is a block-convolutional operator. In thiscase, the GFT block diagonalizesA with blocks of sizeapproximately sd�. The theory generalizes to infinitecompact Lie groups via the Peter–Weyl theorem andto certain important non-compact groups (unimodulargroups), such as the group of Euclidean transforma-tions acting on R

n; see [13].

Applications

Symmetric FFTs appear in many situations, such asreal sine and cosine transforms. The 1-dim real cosinetransform has four symmetries generated by SR and Se,and it can be computed four times faster than the fullcomplex FFT. This transform is central in Chebyshevapproximations. More generally, multivariate Cheby-shev polynomials possess symmetries of kaleidoscopicreflection groups acting upon Zn [9, 11, 14].

Diagonalization of equivariant linear operators isessential in signal and image processing, statistics, anddifferential equations. For a cyclic group Zn, an equiv-ariantA is a Toeplitz circulant, and the GFT is given bythe discrete Fourier transform, which can be computedfast by the FFT algorithm. More generally, a finiteabelian group Zn has jnj one-dimensional irreps, givenby the exponential functions. Zn-equivariant matricesare block Toeplitz circulant matrices. These are diago-nalized by multidimensional FFTs. Linear differentialoperators with constant coefficients typically lead todiscrete linear operators which commute with trans-lations acting on the domain. In the case of periodicboundary conditions, this yields block circulant matri-ces. For more complicated boundaries, block circulantsmay provide useful approximations to the differentialoperators.

More generally, many computational problems pos-sess symmetries given by a (discrete or continuous)group acting on the domain. For example, the Lapla-cian operator commutes with any isometry of the do-main. This can be discretized as an equivariant discreteoperator. If the group action on the discretized domainis free and transitive, the discrete operator is a convolu-tion in the group algebra. More generally, it is a block-convolutional operator. For computational efficiency,it is important to identify the symmetries (or approx-

imate symmetries) of the problem and employ theirreducible characters of the symmetry group and theGFT to (approximately) block diagonalize the opera-tors. Such techniques are called domain reduction tech-niques [7]. Block diagonalization of linear operatorshas applications in solving linear systems, eigenvalueproblems, and computation of matrix exponentials.The GFT has also applications in image processing,image registration, and computational statistics [1–5, 8].

References

1. Ahlander, K., Munthe-Kaas, H.: Applications of the gener-alized Fourier transform in numerical linear algebra. BITNumer. Math. 45(4), 819–850 (2005)

2. Allgower, E., Bohmer, K., Georg, K., Miranda, R.: Ex-ploiting symmetry in boundary element methods. SIAM J.Numer. Anal. 29, 534–552 (1992)

3. Bossavit, A.: Symmetry, groups, and boundary value prob-lems. A progressive introduction to noncommutative har-monic analysis of partial differential equations in domainswith geometrical symmetry. Comput. Methods Appl. Mech.Eng. 56(2), 167–215 (1986)

4. Chirikjian, G., Kyatkin, A.: Engineering Applications ofNoncommutative Harmonic Analysis. CRC, Boca Raton(2000)

5. Diaconis, P.: Group Representations in Probability andStatistics. Institute of Mathematical Statistics, Hayward(1988)

6. Diaconis, P., Rockmore, D.: Efficient computation of theFourier transform on finite groups. J. Am. Math. Soc. 3(2),297–332 (1990)

7. Douglas, C., Mandel, J.: An abstract theory for the domainreduction method. Computing 48(1), 73–96 (1992)

8. Fassler, A., Stiefel, E.: Group Theoretical Methods andTheir Applications. Birkhauser, Boston (1992)

9. Hoffman, M., Withers, W.: Generalized Chebyshev polyno-mials associated with affine Weyl groups. Am. Math. Soc.308(1), 91–104 (1988)

10. Maslen, D., Rockmore, D.: Generalized FFTs—a survey ofsome recent results. Publisher: In: Groups and ComputationII, Am. Math. Soc. vol. 28, pp. 183–287 (1997)

11. Munthe-Kaas, H.Z., Nome, M., Ryland, B.N.: Through theKaleidoscope; symmetries, groups and Chebyshev approx-imations from a computational point of view. In: Founda-tions of Computational Mathematics. Cambridge UniversityPress, Cambridge (2012)

12. Rudin, W.: Fourier Analysis on Groups. Wiley, New York(1962)

13. Serre, J.: Linear Representations of Finite Groups, vol 42.Springer, New York (1977)

14. Ten Eyck, L.: Crystallographic fast Fourier transforms. ActaCrystallogr Sect. A Crystal Phys. Diffr. Theor. GeneralCrystallogr. 29(2), 183–191 (1973)

Symplectic Methods 1451

S

SymplecticMethods

J.M. Sanz-SernaDepartamento de Matematica Aplicada, Universidadde Valladolid, Valladolid, Spain

Definition

This entry, concerned with the practical task of in-tegrating numerically Hamiltonian systems, followsup the entry �Hamiltonian Systems and keeps thenotation and terminology used there.

Each one-step numerical integrator is specified bya smooth map �H

tnC1;tnthat advances the numerical

solution from a time level tn to the next tnC1

.pnC1; qnC1/ D �HtnC1;tn

.pn; qn/I (1)

the superscript H refers to the Hamiltonian functionH.p; qI t/ of the system being integrated. For instancefor the explicit Euler rule

.pnC1; qnC1/ D .pn; qn/C .tnC1 � tn/�f .pn; qnI tn/;

g.pn; qnI tn/�I

here and later f and g denote the d -dimensionalreal vectors with entries �@H=@qi , @H=@pi (d is thenumber of degrees of freedom) so that .f; g/ is thecanonical vector field associated with H (in simplerwords: the right-hand side of Hamilton’s equations).For the integrator to make sense, �H

tnC1;tnhas to approx-

imate the solution operator ˚HtnC1;tn

that advances thetrue solution from its value at tn to its value at tnC1:

�p.tnC1/; q.tnC1/

� D ˚HtnC1;tn

�p.tn/; q.tn/

�:

For a method of (consistency) order �, �HtnC1;tn

differs

from ˚HtnC1;tn

in terms of magnitude O�.tnC1� tn/�C1�.

The solution map ˚HtnC1;tn

is a canonical (symplec-tic) transformation in phase space, an important factthat substantially constrains the dynamics of the truesolution

�p.t/; q.t/

�. If we wish the approximation

�H to retain the “Hamiltonian” features of ˚H , weshould insist on �H also being a symplectic transfor-mation. However, most standard numerical integrators– including explicit Runge–Kutta methods, regardless

of their order � – replace ˚H by a nonsymplecticmapping �H . This is illustrated in Fig. 1 that corre-sponds to the Euler rule as applied to the harmonicoscillator Pp D �q, Pq D p. The (constant) step sizeis tnC1 � tn D 2�=12. We have taken as a familyof initial conditions the points of a circle centered atp D 1, q D 0 and seen the evolution after 1, 2, . . . , 12steps. Clearly the circle, which should move clockwisewithout changing area, gains area as the integrationproceeds: The numerical �H is not symplectic. Asa result, the origin, a center in the true dynamics, isturned by the discretization procedure into an unstablespiral point, i.e., into something that cannot arise inHamiltonian dynamics. For the implicit Euler rule, thecorresponding integration loses area and gives rise to afamily of smaller and smaller circles that spiral towardthe origin. Again, such a stable focus is incompatiblewith Hamiltonian dynamics.

This failure of well-known methods in mimickingHamiltonian dynamics motivated the consideration ofintegrators that generate a symplectic mapping �H

when applied to a Hamiltonian problem. Such methodsare called symplectic or canonical. Since symplec-tic transformations also preserve volume, symplecticintegrators applied to Hamiltonian problems are au-tomatically volume preserving. On the other hand,while many important symplectic integrators are time-

−5

−4

−3

−2

−1

0

1

2

3

4

5

−4 −2 0 2 4q

p

Symplectic Methods, Fig. 1 The harmonic oscillator inte-grated by the explicit Euler method

http://dx.doi.org/10.1007/978-3-540-70529-1_117

1452 Symplectic Methods

reversible (symmetric), reversibility is neither suffi-cient nor necessary for a method to be symplectic ([8],Remark 6.5).

Even though early examples of symplectic integra-tion may be traced back to the 1950s, the systematicexploration of the subject started with the work of FengKang (1920–1993) in the 1980s. An early short mono-graph is [8] and later books are the comprehensive[5] and the more applied [6]. Symplectic integrationwas the first step in the larger endeavor of developingstructure-preserving integrators, i.e., of what is nowoften called, following [7], geometric integration.

Limitations of space restrict this entry to one-stepmethods and canonical Hamiltonian problems. Fornoncanonical Hamiltonian systems and multistep inte-grators the reader is referred to [5], Chaps. VII and XV.

Integrators Based on GeneratingFunctions

The earliest systematic approaches by Feng Kang andothers to the construction of symplectic integrators (see[5], Sect. VI.5.4 and [8], Sect. 11.2) exploited the fol-lowing well-known result of the canonical formalism:The canonical transformation ˚H

tnC1;tnpossesses a gen-

erating function S2 that solves an initial value problemfor the associated Hamilton–Jacobi equation. It is thenpossible, by Taylor expanding that equation, to obtainan approximationeS2 to S2. The transformation�H

tnC1;tn

generated by eS2 will automatically be canonical andtherefore will define a symplectic integrator. If eS2differs from S2 by terms O

�.tnC1 � tn/

�C1�, the inte-grator will be of order �. Generally speaking, the high-order methods obtained by following this procedure aremore difficult to implement than those derived by thetechniques discussed in the next two sections.

Runge–Kutta and Related Integrators

In 1988, Lasagni, Sanz-Serna, and Suris (see[8], Chap. 6) discovered independently that somewell-known families of numerical methods containsymplectic integrators.

Runge–Kutta Methods

Symplecticness ConditionsWhen the Runge–Kutta (RK) method with s stagesspecified by the tableau

a11 � � � a1s:::

: : ::::

as1 � � � assb1 � � � bs

(2)

is applied to the integration of the Hamiltonian systemwith Hamiltonian functionH , the relation (1) takes theform

pnC1 D pn C hnC1sXiD1

bi f .Pi ;Qi I tn C cihnC1/;

qnC1 D qn C hnC1sXiD1

bi g.Pi ;Qi I tn C cihnC1/;

where ci D Pj aij are the abscissae, hnC1 D tnC1� tn

is the step size and Pi ,Qi , i D 1; : : : ; s are the internalstage vectors defined through the system

Pi D pn C hnC1sX

jD1aij f .Pj ;Qj I tn C cj hnC1/;

(3)

Qi D qn C hnC1sX

jD1aij g.Pj ;Qj I tn C cj hnC1/:

(4)

Lasagni, Sanz-Serna, and Suris proved that if thecoefficients of the method in (2) satisfy

biaij C bj aj i � bibj D 0; i; j D 1; : : : ; s; (5)

then the method is symplectic. Conversely ([8],Sect. 6.5), the relations (5) are essentially necessaryfor the method to be symplectic. Furthermore forsymplectic RK methods the transformation (1) is infact exact symplectic ([8], Remark 11.1).

Order ConditionsDue to symmetry considerations, the relations (5) im-pose s.s C 1/=2 independent equations on the s2 C s


S

elements of the RK tableau (2), so that there is noshortage of symplectic RK methods. The available freeparameters may be used to increase the accuracy of themethod. It is well known that the requirement that anRK formula has a target order leads to a set of nonlinearrelations (order conditions) between the elements ofthe corresponding tableau (2). For order � � thereis an order condition associated with each rooted treewith � � vertices and, if the aij and bi are freeparameters, the order conditions are mutually inde-pendent. For symplectic methods however the tableaucoefficients are constrained by (5), and Sanz-Serna andAbia proved in 1991 that then there are redundanciesbetween the order conditions ([8], Sect. 7.2). In factto ensure order � � when (5) holds it is necessaryand sufficient to impose an order condition for eachso-called nonsuperfluous (nonrooted) tree with � �

vertices.

Examples of Symplectic Runge–Kutta MethodsSetting j D i in (5) shows that explicit RK methods(with aij D 0 for i � j ) cannot be symplectic.

Sanz-Serna noted in 1988 ([8], Sect. 8.1) that theGauss method with s stages, s D 1; 2; : : : , (i.e., theunique method with s stages that attains the maximalorder 2s) is symplectic. When s D 1 the method isthe familiar implicit midpoint rule. Since for all Gaussmethods the matrix .aij / is full, the computation ofthe stage vectors Pi and Qi require, at each step,the solution of the system (3) and (4) that comprisess � 2d scalar equations. In non-stiff situations thissystem is readily solved by functional iteration, see[8] Sects. 5.4 and 5.5 and [5] Sect. VIII.6, and then theGauss methods combine the advantages of symplectic-ness, easy implementation, and high order with that ofbeing applicable to all canonical Hamiltonian systems.

If the system being solved is stiff (e.g., it arisesthrough discretization of the spatial variables of aHamiltonian partial differential equation), Newton it-eration has to be used to solve the stage equations (3)and (4), and for high-order Gauss methods the costof the linear algebra may be prohibitive. It is thenof interest to consider the possibility of diagonallyimplicit symplectic RK methods, i.e., methods whereaij D 0 for i < j and therefore (3) and (4) demandthe successive solution of s systems of dimension2d , rather than that of a single .s � 2d/–dimensionalsystem. It turns out ([8], Sect. 8.2) that such meth-ods are necessarily composition methods (see below)

obtained by concatenating implicit midpoint sub-stepsof lengths b1hnC1, . . . , bshnC1. The determination ofthe free parameters bi is a task best accomplished bymeans of the techniques used to analyze compositionmethods.

The B-series ApproachIn 1994, Calvo and Sanz-Serna ([5], Sect. VI.7.2) pro-vided an indirect technique for the derivation of thesymplecticness conditions (5). The first step is to iden-tify conditions for the symplecticness of the associatedB-series (i.e., the series that expands the transformation(1)) in powers of the step size. Then the conditions(on the B-series) obtained in this way are shown to beequivalent to (5). This kind of approach has proved tobe very powerful in the theory of geometric integration,where extensive use is made of formal power series.

Partitioned Runge–Kutta MethodsPartitioned Runge–Kutta (PRK) methods differ fromstandard RK integrators in that they use two tableaux ofcoefficients of the form (2): one to advance p and theother to advance q. Most developments of the theoryof symplectic RK methods are easily adapted to coverthe partitioned situation, see e.g., [8], Sects. 6.3, 7.3,and 8.4.

The main reason ([8], Sect. 8.4) to consider theclass of PRK methods is that it contains integratorsthat are both explicit and symplectic when appliedto separable Hamiltonian systems with H.p; qI t/ DT .p/CV.qI t/, a format that often appears in the appli-cations. It turns out ([8], Remark 8.1, [5], Sect. VI.4.1,Theorem 4.7) that such explicit, symplectic PRK meth-ods may always be viewed as splitting methods (seebelow). Moreover it is advantageous to perform theiranalysis by interpreting them as splitting algorithms.

Runge–Kutta–NystromMethodsIn the special but important case where the (separable)Hamiltonian is of the form H D .1=2/pTM�1p CV.qI t/ (M a positive-definite symmetric matrix) thecanonical equations

d

dtp D �rV.qI t/; d

dtq D M�1p (6)

lead tod2

dt2q D �M�1rV.qI t/;


a second-order system whose right-hand side is in-dependent of .d=dt/q. Runge–Kutta–Nystrom (RKN)methods may then be applied to the second-order formand are likely to improve on RK integrations of theoriginal first-order system (6).

There are explicit, symplectic RKN integrators([8], Sect. 8.5). However their application (see [8],Remark 8.5) is always equivalent to the applicationof an explicit, symplectic PRK method to the first-order equations (6) and therefore – in view of aconsideration made above – to the application of asplitting algorithm.

Integrators Based on Splitting andComposition

The related ideas of splitting and composition areextremely fruitful in deriving practical symplectic in-tegrators in many fields of application. The corre-sponding methods are typically ad hoc for the problemat hand and do not enjoy the universal off-the-shelfapplicability of, say, Gaussian RK methods; however,when applicable, they may be highly efficient. In orderto simplify the exposition, we assume hereafter that theHamiltonian H is time-independent H D H.p; q/;we write �HhnC1

and HhnC1rather than ˚H

tnC1;tnand

�HtnC1;tn

. Furthermore, we shall denote the time stepby h omitting the possible dependence on the stepnumber n.

Splitting

Simplest SplittingThe easiest possibility of splitting occurs when theHamiltonian H may be written as H1 C H2 and theHamiltonian systems associated with H1 and H2 maybe explicitly integrated. If the corresponding flowsare denoted by �H1t and �H2t , the recipe (Lie–Trottersplitting, [8], Sect. 12.4.2, [5], Sect. II.5)

Hh D �H2h ı �H1h (7)

defines the map (1) of a first-order integrator thatis symplectic (the mappings being composed in theright-hand side are Hamiltonian flows and thereforesymplectic). Splittings of H in more than two piecesare feasible but will not be examined here.

A particular case of (7) of great practicalsignificance is provided by the separable HamiltonianH.p; q/ D T .p/ C V.q/ with H1 D T , H2 D V ;the flows associated with H1 and H2 are respectivelygiven by

�p; q

� 7!�p; qC trT .p/�; �p; q� 7!�

p� trV.q/; q�:Thus, in this particular case the scheme (7) reads

pnC1 D pn � hrV.qnC1/; qnC1 D qn C hrT .pn/;(8)

and it is sometimes called the symplectic Euler rule(it is obviously possible to interchange the roles ofp and q). Alternatively, (8) may be considered as aone-stage, explicit, symplectic PRK integrator as in [8],Sect. 8.4.3.

As a second example of splitting, one may consider(nonseparable) formatsH D H1.p; q/CV �.q/, wherethe Hamiltonian system associated with H1 can beintegrated in closed form. For instance, H1 may cor-respond to a set of uncoupled harmonic oscillators andV �.q/ represent the potential energy of the interactionsbetween oscillators. Or H1 may correspond to theKeplerian motion of a point mass attracted to a fixedgravitational center and V � be a potential describingsome sort of perturbation.

Strang SplittingWith the notation in (7), the symmetric Strang formula([8], Sect. 12.4.3, [5], Sect. II.5)

N Hh D �H2h=2 ı �H1h ı �H2h=2 (9)

defines a time-reversible, second-order symplectic in-tegrator N Hh that improves on the first order (7).

In the separable Hamiltonian case H D T .p/ CV.q/, (9) leads to

pnC1=2 D pn � h

2rV.qn/;

qnC1 D qn C hrT .pnC1=2/;

pnC1 D pnC1=2 � h

2rV.qnC1/:

This is the Stormer–Leapfrog–Verlet method that playsa key role in molecular dynamics [6]. It is also possible


S

to regard this integrator as an explicit, symplectic PRKwith two stages ([8], Sect. 8.4.3).

More Sophisticated FormulaeA further generalization of (7) is

�H2ˇsh

ı �H1˛sh ı �H2ˇs�1h ı � � � ı �H2ˇ1h ı �H1˛1h (10)

where the coefficients˛i andˇi ,P

i ˛i D 1,P

i ˇi D 1,are chosen so as to boost the order � of the method.A systematic treatment based on trees of the requiredorder conditions was given by Murua and Sanz-Sernain 1999 ([5], Sect. III.3). There has been much recentactivity in the development of accurate splitting coef-ficients ˛i , ˇi and the reader is referred to the entry� Splitting Methods in this encyclopedia.

In the particular case where the splitting is given byH D T .p/C V.q/, the family (10) provides the mostgeneral explicit, symplectic PRK integrator.

Splitting Combined with ApproximationsIn (7), (9), or (10) use is made of the exact solutionflows �H1t and �H2t . Even if one or both of these flowsare not available, it is still possible to employ theidea of splitting to construct symplectic integrators. Asimple example will be presented next, but many otherswill come easily to mind.

Assume that we wish to use a Strang-like methodbut �H1t is not available. We may then advance thenumerical solution via

�H2h=2 ı b H1

h ı �H2h=2; (11)

whereb H1h denotes a consistent method for the integra-

tion of the Hamiltonian problem associated with H1.If b H1

h is time-reversible, the composition (11) is alsotime-reversible and hence of order � D 2 (at least).And if b H1

h is symplectic, (11) will define a symplecticmethod.

CompositionA step of a composition method ([5], Sect. II.4) con-sists of a concatenation of a number of sub-stepsperformed with one or several simpler methods. Oftenthe aim is to create a high-order method out of low-order integrators; the composite method automaticallyinherits the conservation properties shared by the meth-ods being composed. The idea is of particular appeal

within the field of geometric integration, where it isfrequently not difficult to write down first- or second-order integrators with good conservation properties.

A useful example, due to Suzuki, Yoshida, andothers (see [8], Sect. 13.1), is as follows. Let Hh bea time-reversible integrator that we shall call the basicmethod and define the composition method b H

h by

b Hh D H˛h ı H.1�2˛/h ı H˛hI

if the basic method is symplectic, then b Hh will obvi-

ously be a symplectic method. It may be proved that,if ˛ D .1=3/.2 C 21=3 C 2�1=3/, then b H

h will haveorder � D 4. By using this idea one may performsymplectic, fourth-order accurate integrations whilereally implementing a simpler second-order integra-tor. The approach is particularly attractive when thedirect application of a fourth-order method (such asthe two-stage Gauss method) has been ruled out onimplementation grounds, but a suitable basic method(for instance the implicit midpoint rule or a schemederived by using Strang splitting) is available.

If the (time-reversible) basic method is of order 2�and ˛ D �

2�21=.2�C1/��1 thenb Hh will have order � D

2�C2; the recursive application of this idea shows thatit is possible to reach arbitrarily high orders startingfrom a method of order 2.

For further possibilities, see the entry �CompositionMethods and [8], Sect. 13.1, [5], Sects. II.4 and III.3.

TheModified Hamiltonian

The properties of symplectic integrators outlined in thenext section depend on the crucial fact that, when asymplectic integrator is used, a numerical solution ofthe Hamiltonian system with Hamiltonian H may beviewed as an (almost) exact solution of a Hamiltoniansystem whose Hamiltonian function eH (the so-calledmodified Hamiltonian) is a perturbation of H .

An example. Consider the application of the sym-plectic Euler rule (8) to a one-degree-of-freedom sys-tem with separable Hamiltonian H D T .p/ C V.q/.In order to describe the behavior of the points .pn; qn/computed by the algorithm, we could just say that theyapproximately behave like the solutions .p.tn/; q.tn//of the Hamiltonian system S being integrated. Thiswould not be a very precise description because the

http://dx.doi.org/10.1007/978-3-540-70529-1_146

http://dx.doi.org/10.1007/978-3-540-70529-1_103


true flow �Hh and its numerical approximation Hhdiffer in O.h2/ terms. Can we find another differentialsystem S2 (called a modified system) so that (8) isconsistent of the second order with S2? The points.pn; qn/ would then be closer to the solutions ofS2 than to the solutions of the system S we wantto integrate. Straightforward Taylor expansions ([8],Sect. 10.1) lead to the following expression for S2(recall that f D �@H=@q, g D @H=@p)

d

dtp D f .q/C h

2g.p/f 0.q/;

d

dtq D g.p/

�h2g0.p/f .q/; (12)

where we recognize the Hamiltonian system with (h-dependent!) Hamiltonian

eHh2 D T .p/C V.q/C h

2T 0.p/V 0.q/ D H C O.h/:

(13)

Figure 2 corresponds to the pendulum equationsg.p/ D p, f .q/ D � sin q with initial conditionp.0/ D 0, q.0/ D 2. The stars plot the numericalsolution with h D 0:5. The dotted line H D constantprovides the true pendulum solution. The dash–dot lineeHh2 D constant gives the solution of the modified

system (12). The agreement of the computed pointswith the modified trajectory is very good.

The origin is a center of the modified system (recallthat a small Hamiltonian perturbation of a Hamiltoniancenter is still a center); this matches the fact that, inthe plot, the computed solution does not spiral in orout. On the other hand, the analogous modified systemfor the (nonsymplectic) integration in 1 is found not bea Hamiltonian system, but rather a system with neg-ative dissipation: This agrees with the spiral behaviorobserved there.

By adding extra O.h2/ terms to the right-hand sidesof (12), it is possible to construct a (more accurate)modified system S3 so that (8) is consistent of the thirdorder with S3; thus, S3 would provide an even betterdescription of the numerical solution. The proceduremay be iterated to get modified systems S4, S5, . . . andall of them turn out to be Hamiltonian.

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−2 −1 0 1 2

∗

∗

∗

∗∗∗

∗

∗

∗

∗

∗

∗

∗ ∗∗

∗

∗

q

pSymplectic Methods, Fig. 2 Computed points, true trajectory(dotted line) and modified trajectory (dash–dot line)

General case. Given an arbitrary Hamiltonian sys-tem with a smooth Hamiltonian H , a consistent sym-plectic integrator Hh and an arbitrary integer � > 0,it is possible ([8], Sect. 10.1) to construct a modifiedHamiltonian system S� with Hamiltonian function eHh

�,

such that Hh differs from the flow �eHh�

h in O.h�C1/terms. In fact, eHh

� may be chosen as a polynomial ofdegree < � in h; the term independent of h coincideswithH (cf. (13)) and for a method of order � the termsin h, . . . , h��1 vanish.

The polynomials in h eHh�, � D 2; 3; : : : are the

partial sums of a series in powers of h. Unfortunatelythis series does not in general converge for fixed h,

so that, in particular, the modified flows �eHh�

h cannotconverge to Hh as � " 1. Therefore, in general, it

is impossible to find a Hamiltonian eHh such that �eHh

h

coincides exactly with the integrator Hh . Neishtadt([8], Sect. 10.1) proved that by retaining for each h >0 a suitable number N D N.h/ of terms of theseries it is possible to obtain a Hamiltonian eHh suchthat �eHh

h differs from Hh in an exponentially smallquantity.

Here is the conclusion for the practitioner: Fora symplectic integrator applied to an autonomous


S

Hamiltonian system, modified autonomous Hamil-tonian problems exist so that the computed points lie“very approximately” on the exact trajectories of themodified problems. This makes possible a backwarderror interpretation of the numerical results: Thecomputed solutions are solving “very approximately”a nearby Hamiltonian problem. In a modeling situationwhere the exact form of the Hamiltonian H may be indoubt, or some coefficients in H may be the result ofexperimental measurements, the fact that integratingthe model numerically introduces perturbations to Hcomparable to the uncertainty in H inherent in themodel is the most one can hope for.

On the other hand, when a nonsymplectic formulais used the modified systems are not Hamiltonian: Theprocess of numerical integration perturbs the model insuch a way as to take it out of the Hamiltonian class.

Variable steps. An important point to be noted isas follows: The backward error interpretation onlyholds if the numerical solution after n steps is com-puted by iterating n times one and the same symplec-tic map. If, alternatively, one composes n symplecticmaps (one from t0 to t1, a different one from t1to t2, etc.) the backward error interpretation is lost,because the modified system changes at each step ([8],Sect. 10.1.3).

As a consequence, most favorable properties ofsymplectic integrators (and of other geometric inte-grators) are lost when they are naively implementedwith variable step sizes. For a complete discussion ofthis difficulty and of ways to circumvent it, see [5],Sects. VIII 1–4.

Finding explicitly the modified Hamiltonians. Theexistence of a modified Hamiltonian system is a generalresult that derives directly from the symplecticness ofthe transformation Hh ([8], Sect. 10.1) and does notrequire any hypothesis on the particular nature of sucha transformation. However, much valuable informationmay be derived from the explicit construction of themodified Hamiltonians. For RK and related methods,a way to compute systematically the eHh

�’s was firstdescribed by Hairer in 1994 and then by Calvo, Mu-rua, and Sanz-Serna ([5], Sect. IX.9). For splitting andcomposition integrators, the eHh

�’s may be obtained byuse of the Baker–Campbell–Hausdorff formula ([8],Sect. 12.3, [5], Sect. III.4) that provides a means toexpress as a single flow the composition of two flows.

This kind of research relies very much on concepts andtechniques from the theory of Lie algebras.

Properties of Symplectic Integrators

We conclude by presenting an incomplete list of favor-able properties of symplectic integrators. Note that theadvantage of symplecticness become more prominentas the integration interval becomes longer.

Conservation of energy. For autonomous Hamilto-nians, the value ofH is of course a conserved quantityand the invariance of H usually expresses conser-vation of physical energy. Ge and Marsden provedin 1988 ([8], Sect. 10.3.2) that the requirements ofsymplecticness and exact conservation of H cannotbe met simultaneously by a bona fide numerical inte-grator. Nevertheless, symplectic integrators have verygood energy behavior ([5], Sect. IX.8): Under verygeneral hypotheses, for a symplectic integrator of or-der �: H.pn; qn/ D H.p0; q0/ C O.h�/, where theconstant implied in the O notation is independentof n over exponentially long time intervals nh �exp

�h0=.2h/

�.

Linear error growth in integrable systems. For aHamiltonian problem that is integrable in the senseof the Liouville–Arnold theorem, it may be proved([5], Sect. X.3) that, in (long) time intervals of lengthproportional to h�� , the errors in the action variablesare of magnitude O.h�/ and remain bounded, whilethe errors in angle variables are O.h�/ and exhibit agrowth that is only linear in t . By implication the errorgrowth in the components ofp and q will be O.h�/ andgrow, at most, linearly. Conventional integrators, in-cluding explicit Runge–Kutta methods, typically showquadratic error growth in this kind of situation andtherefore cannot be competitive in a sufficiently longintegration.

KAM theory. When the system is closed to inte-grable, the KAM theory ([5], Chap. X) ensures, amongother things, the existence of a number of invariant torithat contribute to the stability of the dynamics (see [8],Sect. 10.4 for an example). On each invariant torus themotion is quasiperiodic. Symplectic integrators ([5],Chap. X, Theorem 6.2) possess invariant tori O.h�/close to those of the system being integrated and

1458 Systems Biology, Minimalist vs Exhaustive Strategies

furthermore the dynamics on each invariant torus isconjugate to its exact counterpart.

Linear error growth in other settings. Integrablesystems are not the only instance where symplecticintegrators lead to linear error growth. Other casesinclude, under suitable hypotheses, periodic orbits,solitons, relative equilibria, etc., see, among others,[1–4].

Cross-References

�B-Series�Composition Methods�Euler Methods, Explicit, Implicit, Symplectic�Gauss Methods�Hamiltonian Systems�Molecular Dynamics�Nystrom Methods�One-Step Methods, Order, Convergence�Runge–Kutta Methods, Explicit, Implicit� Symmetric Methods

References

1. Cano, B., Sanz-Serna, J.M.: Error growth in the numericalintegration of periodic orbits, with application to Hamil-tonian and reversible systems. SIAM J. Numer. Anal. 34,1391–1417 (1997)

2. de Frutos, J., Sanz-Serna, J.M.: Accuracy and conserva-tion properties in numerical integration: the case of theKorteweg-deVries equation. Numer. Math. 75, 421–445(1997)

3. Duran, A., Sanz-Serna, J.M.: The numerical integration ofrelative equilibrium solution. Geometric theory. Nonlinear-ity 11, 1547–1567 (1998)

4. Duran, A., Sanz-Serna, J.M.: The numerical integra-tion of relative equilibrium solutions. The nonlinearSchroedinger equation. IMA. J. Numer. Anal. 20, 235–261(1998)

5. Hairer, E., Lubich, Ch., Wanner, G.: Geometric NumericalIntegration, 2nd edn. Springer, Berlin (2006)

6. Leimkuhler, B., Reich, S.: Simulating Hamiltonian Dynam-ics. Cambridge University Press, Cambridge (2004)

7. Sanz-Serna, J.M.: Geometric integration. In: Duff, I.S.,Watson, G.A. (eds.) The State of the Art in NumericalAnalysis. Clarendon Press, Oxford (1997)

8. Sanz-Serna, J.M., Calvo, M.P.: Numerical HamiltonianProblems. Chapman & Hall, London (1994)

Systems Biology, Minimalist vsExhaustive Strategies

Jeremy GunawardenaDepartment of Systems Biology, Harvard MedicalSchool, Boston, MA, USA

Introduction

Systems biology may be defined as the study of howphysiology emerges from molecular interactions [11].Physiology tells us about function, whether at theorganismal, tissue, organ or cellular level; molecularinteractions tell us about mechanism. How do we relatemechanism to function? This has always been one ofthe central problems of biology and medicine but itattains a particular significance in systems biology be-cause the molecular realm is the base of the biologicalhierarchy. Once the molecules have been identified,there is nowhere left to go but up.

This is an enormous undertaking, encompassing,among other things, the development of multicellularorganisms from their unicellular precursors, the hier-archical scales from molecules to cells, tissues, andorgans, and the nature of malfunction, disease, and re-pair. Underlying all of this is evolution, without whichbiology can hardly be interpreted. Organisms are notdesigned to perform their functions, they have evolvedto do so—variation, transfer, drift, and selection havetinkered with them over 3:5 � 109 years—and this hashad profound implications for how their functions havebeen implemented at the molecular level [12].

The mechanistic viewpoint in biology has nearlyalways required a strongly quantitative perspective andtherefore also a reliance on quantitative models. If thistrend seems unfamiliar to those who have been rearedon molecular biology, it is only because our histori-cal horizons have shrunk. The quantitative approachwould have seemed obvious to physiologists, geneti-cists, and biochemists of an earlier generation. More-over, quantitative methods wax and wane within anindividual discipline as new experimental techniquesemerge and the focus shifts between the descriptiveand the functional. The great Santiago Ramon y Cajal,to whom we owe the conception of the central ner-vous system as a network of neurons, classified “theo-rists” with “contemplatives, bibliophiles and polyglots,

http://dx.doi.org/10.1007/978-3-540-70529-1_98

http://dx.doi.org/10.1007/978-3-540-70529-1_103

http://dx.doi.org/10.1007/978-3-540-70529-1_111

http://dx.doi.org/10.1007/978-3-540-70529-1_115

http://dx.doi.org/10.1007/978-3-540-70529-1_117

http://dx.doi.org/10.1007/978-3-540-70529-1_56

http://dx.doi.org/10.1007/978-3-540-70529-1_129

http://dx.doi.org/10.1007/978-3-540-70529-1_130

http://dx.doi.org/10.1007/978-3-540-70529-1_144

http://dx.doi.org/10.1007/978-3-540-70529-1_151

Systems Biology, Minimalist vs Exhaustive Strategies 1459

S

megalomaniacs, instrument addicts, misfits” [3]. Yet,when Cajal died in 1934, Alan Hodgkin was alreadystarting down the road that would lead to the Hodgkin-Huxley equations.

In a similar way, the qualitative molecular biologyof the previous era is shifting, painfully and with muchgrinding of gears, to a quantitative systems biology.What kind of mathematics will be needed to supportthis? Here, we focus on the level of abstraction formodelling cellular physiology at the molecular level.This glosses over many other relevant and hard prob-lems but allows us to bring out some of the distinctivechallenges of molecularity. We may caricature thecurrent situation in two extreme views. One approach,mindful of the enormous complexity at the molecularlevel, strives to encompass that complexity, to dive intoit, and to be exhaustive; the other, equally mindfulof the complexity but with a different psychology,strives to abstract from it, to rise above it, and to beminimalist. Here, we examine the requirements and theimplications of both strategies.

Models as Dynamical Systems

Many different kinds of models are available for de-scribing molecular systems. It is convenient to think ofeach as a dynamical system, consisting of a descriptionof the state of the system along with a descriptionof how that state changes in time. The system statetypically amalgamates the states of various molecularcomponents, which may be described at various levelsof abstraction. For instance, Boolean descriptions areoften used by experimentalists when discussing geneexpression: this gene is ON, while that other is OFF.Discrete dynamic models can represent time evolutionas updates determined by Boolean functions. At theother end of the abstraction scale, gene expression maybe seen as a complex stochastic process that takes placeat an individual promoter site on DNA: states may bedescribed by the numbers of mRNA molecules andthe time evolution may be described by a stochasticmaster equation. In a different physiological context,Ca2C ions are a “second messenger” in many keysignalling networks and show complex spatial andtemporal behaviour within an individual cell. The statemay need to be described as a concentration that variesin space and time and the time evolution by a partialdifferential equation. In the most widely-used form

of dynamical system, the state is represented by the(scalar) concentrations of the various molecular com-ponents in specific cellular compartments and the timeevolution by a system of coupled ordinary differentialequations (ODEs).

What is the right kind of model to use? That dependsentirely on the biological context, the kind of biologicalquestion that is being asked and on the experimentalcapabilities that can be brought to bear on the problem.Models in biology are not objective descriptions ofreality; they are descriptions of our assumptions aboutreality [2].

For our purposes, it will be simplest to discussODE models. Much of what we say applies to otherclasses of models. Assume, therefore, that the system isdescribed by the concentrations within specific cellularcompartments of n components, x1; � � � ; xn, and thetime evolution is given, in vector form, by dx=dt Df .xI a/. Here, a 2 R

m is a vector of parameters.These may be quantities like proportionality constantsin rate laws. They have to take numerical values beforethe dynamics on the state space can be fully definedand thereby arises the “parameter problem” [6]. In anyserious model, most of the parameter values are notknown, nor can they be readily determined experimen-tally. (Even if some of them can, there is always aquestion of whether an in-vitro measurement reflectsthe in-vivo context.)

The dynamical behaviour of a system may dependcrucially on the specific parameter values. As thesevalues change through a bifurcation, the qualitative“shape” of the dynamics may alter drastically; forinstance, steady states may alter their stability or ap-pear or disappear [22]. In between bifurcations, theshape of the dynamics only alters in a quantitative waywhile the qualitative portrait remains the same. Thegeography of parameter space therefore breaks up intoregions; within each region the qualitative portrait ofthe dynamics remains unaltered, although its quantita-tive details may change, while bifurcations take placebetween regions resulting in qualitative changes in thedynamics (Fig. 1).

Parameterology

We see from this that parameter values matter. They aretypically determined by fitting the model to experimen-tal data, such as time series for the concentrations of


f1(x1)

1 – a2/a10

a2x1

x1

a1x1(1 – x1)

dx1 f1(x1)

a1x1(1 – x1) – a2x1

dt=

=

a

b

1 – a2/a1

a1 ≤ a2

a1

a2

0 0

a1 > a2

x1x1

c

Systems Biology, Minimalist vs Exhaustive Strategies,Fig. 1 The geography of parameter space. (a) A dynamicalsystem with one state variable, x1 and two parameters, a1, a2.(b) Graphs of f1.x1/ (dashed curve) and of the terms in it (solidcurves), showing the steady states, where dx1=dt D f1.x1/ D0. (c) Parameter space breaks up into two regions: (1) a1 � a2,in which the state space has a single stable steady state at x1 D 0

to which any positive initial condition tends (arrow); and (2)

a1 > a2, in which there are two steady states, a positive stablestate at x1 D 1� a2=a1, to which any positive initial conditionstends (arrows), and an unstable state at x1 D 0. Here, thestate space is taken to be the nonnegative real line. A magentadot indicates a stable state and a cyan dot indicates an unstablestate. The dynamics in the state space undergoes a transcriticalbifurcation at a1 D a2 [22]

some of the components. The fitting may be undertakenby minimizing a suitable measure of discrepancy be-tween the calculated and the observed data, for whichseveral nonlinear optimization algorithms are available[10]. Empirical studies on models with many (>10)parameters have revealed what could be described asa “80/20” rule [9, 19]. Roughly speaking, 20 % of theparameters are well constrained by the data or “stiff”:They cannot be individually altered by much withoutsignificant discrepancy between calculated and ob-served data. On the other hand, 80 % of the parametersare poorly constrained or “sloppy,” they can be individ-ually altered by an order of magnitude or more, withouta major impact on the discrepancy. The minimizationlandscape, therefore, does not have a single deep holebut a flat valley with rather few dimensions orthogonalto the valley. The fitting should have localized thevalley within one of the parameter regions. At present,no theory accounts for the emergence of these valleys.

Two approaches can be taken to this finding. On theone hand, one might seek to constrain the parameters

further by acquiring more data. This raises an inter-esting problem of how best to design experiments toefficiently constrain the data. Is it better to get moreof the same data or to get different kinds of data?On the other hand, one might seek to live with thesloppiness, to acknowledge that the fitted parametervalues may not reflect the actual ones but neverthelessseek to draw testable conclusions from them. Forinstance, the stiff parameters may suggest experimen-tal interventions whose effects are easily observable.(Whether they are also biological interesting is anothermatter.) There may also be properties of the systemthat are themselves insensitive, or “robust,” to theparameter differences. One can, in any case, simplydraw conclusions based on the fits and seek to test theseexperimentally.

A successful test may provide some encouragementthat the model has captured aspects of the mechanismthat are relevant to the question under study. However,models are working hypotheses, not explanations. Theconclusion that is drawn may be correct but that may


S

be an accident of the sloppy parameter values or theparticular assumptions made. It may be correct forthe wrong reasons. Molecular complexity is such thatthere may well be other models, based on differ-ent assumptions, that lead to the same conclusions.Modellers often think of models as finished entities.Experimentalists know better. A model is useful onlyas a basis for making a better model. It is throughrepeated tests and revised assumptions that a firmlygrounded, mechanistic understanding of molecular be-haviour slowly crystallizes.

Sometimes, one learns more when a test is notsuccessful because that immediately reveals a problemwith the assumptions and stimulates a search to correctthem. However, it may be all too easy, because of thesloppiness in the fitting, to refit the model using thedata that invalidated the conclusions and then claimthat the newly fitted model “accounts for the new data.”From this, one learns nothing. It is better to followPopperian principles and to specify in advance how amodel is to be rejected. If a model cannot be rejected, itcannot tell you anything. In other areas of experimentalscience, it is customary to set aside some of the data tofit the model and to use another part of the data, ornewly acquired data, to assess the quality of the model.In this way, a rejection criterion can be quantified andone can make objective comparisons between differentmodels.

The kind of approaches sketched above only workwell when modelling and experiment are intimatelyintegrated [18,21]. As yet, few research groups are ableto accomplish this, as both aspects require substantial,but orthogonal, expertise as well as appropriate inte-grated infrastructure for manipulating and connectingdata and models.

Model Simplification

As pointed out above, complexity is a relative matter.Even the most complex model has simplifying assump-tions: components have been left out; posttranslationalmodification states collapsed; complex interactions ag-gregated; spatial dimensions ignored; physiologicalcontext not made explicit. And these are just someof the things we know about, the “known unknowns.”There are also the “unknown unknowns.” We hope thatwhat has been left out is not relevant to the question

being asked. As always, that is a hypothesis, whichmay or may not turn out to be correct.

The distinction between exhaustive and minimalmodels is therefore more a matter of scale than of sub-stance. However, having decided upon a point on thescale, and created a model of some complexity, thereare some systematic approaches to simplifying it. Here,we discuss just two.

One of the most widely used methods is separationof time scales. A part of the system is assumed to beworking significantly faster than the rest. If the fasterpart is capable of reaching a (quasi) steady state, thenthe slower part is assumed to see only that steadystate and not the transient states that led to it. In somecases, this allows variables within the faster part to beeliminated from the dynamics.

Separation of time scales appears in Michaelis andMenten’s pioneering study of enzyme-catalysed reac-tions. Their famous formula for the rate of an en-zyme arises by assuming that the intermediate enzyme-substrate complex is in quasi-steady state [8]. Althoughthe enzyme-substrate complex plays an essential role,it has been eliminated from the formula. The King-Altman procedure formalizes this process of elimina-tion for complex enzyme mechanisms with multipleintermediates. This is an instance of a general methodof linear elimination underlying several well-knownformulae in enzyme kinetics, in protein allostery andin gene transcription, as well as more modern simpli-fications arising in chemical reaction networks and inmultienzyme posttranslational modification networks[7].

An implicit assumption is often made that, afterelimination, the behaviour of the simplified dynamicalsystem approximates that of the original system. Themathematical basis for confirming this is through asingular perturbation argument and Tikhonov’s Theo-rem [8], which can reveal the conditions on parametervalues and initial conditions under which the approx-imation is valid. It must be said that, aside fromthe classical Michaelis–Menten example, few singularperturbation analyses have been undertaken. Biologicalintuition can be a poor guide to the right conditions. Inthe Michaelis–Menten case, for example, the intuitivebasis for the quasi-steady state assumption is thatunder in vitro conditions, substrate, S , is in excessover enzyme, E: Stot � Etot . However, singularperturbation reveals a broader region, Stot C KM �Etot , where KM is the Michaelis–Menten constant, in


which the quasi-steady state approximation remainsvalid [20]. Time-scale separation has been widely usedbut cannot be expected to provide dramatic reductionsin complexity; typically, the number of components arereduced twofold, not tenfold.

The other method of simplification is also basedon an old trick: linearization in the neighbourhood ofa steady state. The Hartman–Grobman Theorem fora dynamical system states that, in the local vicinity of ahyperbolic steady state—that is, one in which none ofthe eigenvalues of the Jacobian have zero real part—the nonlinear dynamics is qualitatively similar to thedynamics of the linearized system, dy=dt D .Jf /ssy,where y D x � xss is the offset relative to the steadystate, xss , and .Jf /ss is the Jacobian matrix for thenonlinear system, dx=dt D f .x/, evaluated at thesteady state. Linearization simplifies the dynamics butdoes not reduce the number of components.

Straightforward linearization has not been particu-larly useful for analysing molecular networks, becauseit loses touch with the underlying network structure.However, control engineering provides a systematicway to interrogate the linearized system and, poten-tially, to infer a simplified network. Such methodswere widely used in physiology in the cybernetic era[5], and are being slowly rediscovered by molecularsystems biologists. They are likely to be most use-ful when the steady state is homeostatically main-tained. That is, when the underlying molecular net-work acts like a thermostat to maintain some internalvariable within a narrow range, despite external fluc-tuations. Cells try to maintain nutrient levels, energylevels, pH, ionic balances, etc., fairly constant, asdo organisms in respect of Claude Bernard’s “milieuinterieure”; chemotaxing E. coli return to a constanttumbling rate after perturbation by attractants or repel-lents [23]; S. cerevisiae cells maintain a constant os-motic pressure in response to external osmotic shocks[16].

The internal structure of a linear control system canbe inferred from its frequency response. If a stablelinear system is subjected to a sinusoidal input, itssteady-state output is a sinuosoid of the same frequencybut possibly with a different amplitude and phase.The amplitude gain and the phase shifts, plotted asfunctions of frequency—the so-called Bode plots, afterHendrik Bode, who developed frequency analysis atBell Labs—reveal a great deal about the structure ofthe system [1]. More generally, the art of systems

engineering lies in designing a linear system whosefrequency response matches specified Bode plots.

The technology is now available to experimentallymeasure approximate cellular frequency responses inthe vicinity of a steady state, at least under simple con-ditions. Provided the amplitude of the forcing is not toohigh, so that a linear approximation is reasonable, andthe steady state is homeostatically maintained, reverseengineering of the Bode plots can yield a simplifiedlinear control system that may be an useful abstractionof the complex nonlinear molecular network responsi-ble for the homeostatic regulation [15]. Unlike time-scale separation, the reduction in complexity can bedramatic. As always, this comes at the price of a moreabstract representation of the underlying biology but,crucially, one in which some of the control structure isretained. However, at present, we have little idea howto extend such frequency analysis to large perturba-tions, where the nonlinearities become significant, orto systems that are not homeostatic.

Frequency analysis, unlike separation of timescales, relies on data, reinforcing the point madepreviously that integrating modelling with experimentsand data can lead to powerful synergies.

Looking Behind the Data

Experimentalists have learned the hard way to de-velop their conceptual understanding from experimen-tal data. As the great Otto Warburg advised, “Solutionsusually have to be found by carrying out innumerableexperiments without much critical hesitation” [13].However, sometimes the data you need is not the datayou get, in which case conceptual interpretation canbecome risky. For instance, signalling in mammaliancells has traditionally relied on grinding up 106 cellsand running Western blots with antibodies againstspecific molecular states. Such data has told us a greatdeal, qualitatively. However, a molecular network oper-ates in a single cell. Quantitative data aggregated overa cell population is only meaningful if the distributionof responses in the population is well represented byits average. Unfortunately, that is not always the case,most notoriously when responses are oscillatory. Theaveraged response may look like a damped oscillation,while individual cells actually have regular oscillationsbut at different frequencies and phases [14, 17]. Evenwhen the response is not oscillatory single-cell analysis


S

may reveal a bimodal response, with two apparentlydistinct sub-populations of cells [4]. In both cases, thevery concept of an “average” response is a statisticalfiction that may be unrelated to the behaviour of anycell in the population.

One moral of this story is that one should al-ways check whether averaged data is representativeof the individual, whether individual molecules, cells,or organisms. It is surprising how rarely this is done.The other moral is that data interpretation shouldalways be mechanistically grounded. No matter howintricate the process through which it is acquired, thedata always arises from molecular interactions takingplace in individual cells. Understanding the molecularmechanism helps us to reason correctly, to know whatdata are needed and how to interpret the data we get.Mathematics is an essential tool in this, just as it wasfor Michaelis and Menten. Perhaps one of the reasonsthat biochemists of the Warburg generation were sosuccessful, without “critical hesitation,” was becauseMichaelis and others had already provided a soundmechanistic understanding of how individual enzymesworked. These days, systems biologists confront ex-traordinarily complex multienzyme networks and wantto know how they give rise to cellular physiology. Weneed all the mathematical help we can get.

References

1. Astrom, K.J., Murray, R.M.: Feedback systems. An Intro-duction for Scientists and Engineers. Princeton UniversityPress, Princeton (2008)

2. Black, J.: Drugs from emasculated hormones: the principlesof syntopic antagonism. In: Frangsmyr, T. (ed.) Nobel Lec-tures, Physiology or Medicine 1981–1990. World Scientific,Singapore (1993)

3. Cajal, S.R.: Advice for a Young Investigator. MIT Press,Cambridge (2004)

4. Ferrell, J.E., Machleder, E.M.: The biochemical basis ofan all-or-none cell fate switch in Xenopus oocytes. Science280, 895–898 (1998)

5. Grodins, F.S.: Control Theory and Biological Systems.Columbia University Press, New York (1963)

6. Gunawardena, J.: Models in systems biology: the parameterproblem and the meanings of robustness. In: Lodhi, H.,Muggleton, S. (eds.) Elements of Computational SystemsBiology. Wiley Book Series on Bioinformatics. Wiley,Hoboken (2010)

7. Gunawardena, J.: A linear elimination framework. http://arxiv.org/abs/1109.6231 (2011)

8. Gunawardena, J.: Modelling of interaction networks in thecell: theory and mathematical methods. In: Egelmann, E.(ed.) Comprehensive Biophysics, vol. 9. Elsevier, Amster-dam (2012)

9. Gutenkunst, R.N., Waterfall, J.J., Casey, F.P., Brown, K.S.,Myers, C.R., Sethna, J.P.: Universally sloppy parametersensitivities in systems biology models. PLoS Comput. Biol.3, 1871–1878 (2007)

10. Kim, K.A., Spencer, S.L., Ailbeck, J.G., Burke, J.M.,Sorger, P.K., Gaudet, S., Kim, D.H.: Systematic calibrationof a cell signaling network model. BMC Bioinformatics. 11,202 (2010)

11. Kirschner, M.: The meaning of systems biology. Cell 121,503–504 (2005)

12. Kirschner, M.W., Gerhart, J.C.: The Plausibility of Life.Yale University Press, New Haven (2005)

13. Krebs, H.: Otto Warburg: Cell Physiologist, Biochemist andEccentric. Clarendon, Oxford (1981)

14. Lahav, G., Rosenfeld, N., Sigal, A., Geva-Zatorsky, N.,Levine, A.J., Elowitz, M.B., Alon, U.: Dynamics of the p53-Mdm2 feedback loop in individual cells. Nat. Genet. 36,147–150 (2004)

15. Mettetal, J.T., Muzzey, D., Gomez-Uribe, C., vanOudenaarden, A.: The frequency dependence of osmo-adaptation in Saccharomyces cerevisiae. Science 319,482–484 (2008)

16. Muzzey, D., Gomez-Uribe, C.A., Mettetal, J.T., vanOudenaarden, A.: A systems-level analysis of perfect adap-tation in yeast osmoregulation. Cell 138, 160–171 (2009)

17. Nelson, D.E., Ihekwaba, A.E.C., Elliott, M., Johnson, J.R.,Gibney, C.A., Foreman, B.E., Nelson, G., See, V., Horton,C.A., Spiller, D.G., Edwards, S.W., McDowell, H.P., Unitt,J.F., Sullivan, E., Grimley, R., Benson, N., Broomhead, D.,Kell, D.B., White, M.R.H.: Oscillations in NF-�B controlthe dynamics of gene expression. Science 306, 704–708(2004)

18. Neumann, L., Pforr, C., Beaudoin, J., Pappa, A., Fricker, N.,Krammer, P.H., Lavrik, I.N., Eils, R.: Dynamics within theCD95 death-inducing signaling complex decide life anddeath of cells. Mol. Syst. Biol. 6, 352 (2010)

19. Rand, D.A.: Mapping the global sensitivity of cellular net-work dynamics: sensitivity heat maps and a global summa-tion law. J. R. Soc Interface. 5(Suppl 1), S59–S69 (2008)

20. Segel, L.: On the validity of the steady-state assumption ofenzyme kinetics. Bull. Math. Biol. 50, 579–593 (1988)

21. Spencer, S.L., Gaudet, S., Albeck, J.G., Burke, J.M.,Sorger, P.K.: Non-genetic origins of cell-to-cell variabilityin TRAIL-induced apoptosis. Nature 459, 428–433 (2009)

22. Strogatz, S.H.: Nonlinear Dynamics and Chaos: with Ap-plications to Physics, Biology, Chemistry and Engineering.Perseus Books, Cambridge (2001)

23. Yi, T.M., Huang, Y., Simon, M.I., Doyle, J.: Robust perfectadaptation in bacterial chemotaxis through integral feedbackcontrol. Proc. Natl. Acad. Sci. U.S.A. 97, 4649–4653 (2000)

http://arxiv.org/abs/1109.6231

http://arxiv.org/abs/1109.6231

samplingtechniquesforcomputational statisticalphysicsjin/ps/eacm.pdf · 1edinburgh university...

Documents