powerpoint at readingdarc/nceo_training/reading2011/ensemble_da.… · kkk k p h hp h r 11 af p i...
TRANSCRIPT
NCEO 2 Day Intensive Course on Data Assimilation 1
Ensemble methods for data assimilation
Stefano Migliorini
National Centre for Earth Observation
University of Reading
NCEO 2 Day Intensive Course on Data Assimilation 2
Contents
• The weather prediction problem
• Need for data assimilation
• The Kalman filter
• Data assimilation techniques for NWP
• Ensemble methods: features and issues
• Examples with idealized models
NCEO 2 Day Intensive Course on Data Assimilation 3
Numerical weather prediction
• The mass conservation equation, the momentum
equation, the energy equation, the concentration
equation for humidity and the equation of state
give expressions for the rates of change of x(r,t)
= (v, T, p, ρ, q)
• all the properties of the system can in principle
be calculated from nonlinear partial differential
equations (PDE’s) and boundary and
initial conditions.1 0
( )t tMx x
NCEO 2 Day Intensive Course on Data Assimilation 4
Predictability
• Consider two initial atmospheric states and
which differ by an infinitesimally small quantity
(or “error”)
• A initial difference (or “error”) between two
solutions (or “analogs”) to the equations for the
evolution of the (atmospheric) state may grow
with time
0tx
0tx
0 x
t0
xt0
x
t1
M(xt0
) M(xt0
0) M(x
t0
)M(xt0
)0
1 M(x
t0
) M(xt0
) M(xt0
)0
1 0 0 0 0( ) ( ) ( : )nn t t nt t
M x M x M
NCEO 2 Day Intensive Course on Data Assimilation 5
xt0
xt0
ε0
xt1=M(xt0)
xt1=M(xt0)=M(xt0)+M(xt0)ε
ε1
xt0=xt0+ε0
xt1=xt1+ε1
NCEO 2 Day Intensive Course on Data Assimilation 6
The need for data assimilation• Weather and climate forecasts are uncertain due
to uncertainty in– Initial and boundary conditions
– Models (e.g., hydrostatic assumption or shallow water approximation; discretization)
– Parameters (e.g., CO2 sources and sinks)
• Not only we need accurate models but also good knowledge of initial (and boundary) conditions
• It is essential that good quality, widespread and frequent measurements (e.g., from satellites) are assimilated in NWP models in order to
achieve good quality forecasts
NCEO 2 Day Intensive Course on Data Assimilation 7
Data coverage at ECMWFsurface stations buoys
aircraft radiosondesCourtesy ECMWF
NCEO 2 Day Intensive Course on Data Assimilation 8
The global satellite operationalobserving system
NCEO 2 Day Intensive Course on Data Assimilation 9
Data coverage at ECMWFInfrared AMVs Microwave imagers
IASI Bending angleCourtesy ECMWF
NCEO 2 Day Intensive Course on Data Assimilation 10
Observation model
• Discrete stochastic observation model
• Where we assume:
• E{k}=0 for all k
• E{x0kT} = 0
• E{khjT} = 0
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 … tk …
y
xk
k
( )k k kH y x ε
{ }kT
k j
j kE
j k
Rε ε
0
Observational issues
• Bias monitoring
• Quality control
obs rejected when
• Correlated observation error
Whithening filter:
NCEO 2 Day Intensive Course on Data Assimilation 11
HIRS channel 5 on NOAA-14 satellite HIRS channel 5 on NOAA-16 satellite
cov( ) T
b y Hx R HBH
Courtesy Dick Dee ECMWF
2 2 2 1/2( )b o by hx h
1/2 1/2 1/2
t o
R y R Hx R ε
NCEO 2 Day Intensive Course on Data Assimilation 12
Stochastic-dynamic model
• In the presence of model error the dynamical system is described by
• Where x0 random with E{x0} = m0 and E{(x0 -m0)(x0 - m0
)T} = P0
• We assume (simplifying assumptions!):
• E{hk} = 0 for all k and
• Model error uncorrelated with the initial state:
E{hk x0T} = 0
• This means xk is random with probability density p(xk)
• xk (or, equivalently, p(xk)) is to be determined
h
k
1( )k k kM x x η
{ }kT
k j
j kE
j k
Qη η
0
NCEO 2 Day Intensive Course on Data Assimilation 13
The data assimilation problem
• Consider a set of realizations of observations
• When k≤l (i.e., for tk≤tl), the conditional
probability density p(xk|Yl) provides the solution
of the data assimilation (or filtering) problem
• From p(xk|Yl) we could calculate E{xk|Yl} (the
estimate of xk at time tk or analysis) and the
covariance matrix
{ 1 2Y , , ,l l y y y
NCEO 2 Day Intensive Course on Data Assimilation 14
Data assimilation for NWP
• Is data assimilation (or probabilistic forecasting) a
solved problem?? Certainly not for NWP
applications!
• If we sample the probability to find a random
variable between 0 and 1 with a 0.01 resolution,
we need 101 sample points (say 100 for simplicity)
• How many sample points do we need to map the
probability space in the case of a random vector
with 7 components? (think of x(r,t) = (v, T, p, ρ,
q))
NCEO 2 Day Intensive Course on Data Assimilation 15
The “curse of dimensionality”• We need, you guessed it, 1007 = 1014 sample
points. And to store such a number on a computer we need a hundred terabyte’s memory space, for each grid point! This is a lot of hard-
disks…
• Note that the current Met Office operational global NWP model considers 1024 x 769 grid
points over 70 vertical levels (~55 million grid points) at a given time!
• This means that, by today standards, it is not possible to sample the pdf of the state vector used in NWP
NCEO 2 Day Intensive Course on Data Assimilation 16
The Kalman filter• The good news is: under certain conditions we may
not need to keep track of the whole joint state pdf
• If all relevant errors (i.e., model, initial conditions
and observations errors) are Gaussian, their pdfs are
completely known if their means and covariances are
known
• When the model and the observation operator are
linear, an optimal estimate of the mean and the
covariance of p(xk|Yl) are given by the Kalman filter,
that is the solution of the data assimilation problem.
• In the presence of (moderate) nonlinearities (typical
case in NWP): extended Kalman filter
NCEO 2 Day Intensive Course on Data Assimilation 17
The extended Kalman filter • Let now consider the nonlinear case with
Gaussian errors
• Forecast step
• Data assimilation step
1 1( , )k kN ε 0 R
1 ( )f a
k kM x x
1 1( , )k kN η 0 Q( , )a f
k k kNx x P
1 1( )k k kM x x η
1 1
f a T
k k k P MP M Q
( )k k kH y x ε
1 1 1 1( ( ))a f f
k k k kH x x K y x 1
1 1 1( )f T f T
k k k
K P H HP H R
1 1( )a f
k k P I KH P
( )
ak k
k
k
M
x x
xM
x
( )
ak k
k
k
H
x x
xH
x
NCEO 2 Day Intensive Course on Data Assimilation 18
Example: scalar and linear case
• Updating at tk+1 prior information valid at tk with yk+1
1 1k k kx ax h 1 1 1k k ky x
2 2 2 2
1 1( ) ( ) ( )f f f
k k k ka h P
2
1
2 2
1
( )
( ) ( )
f
k
f
k k
K2
11 12 2
1
( )( )
( ) ( )
fa f fkk k k kf
k k
x ax y ax
2 2
2 2 2 2 2
1 1 12 2 2 2
1 1
( ) ( )( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
a a f fk kk k k k kf f
k k k k
a h
P
1
f f
k kx ax
NCEO 2 Day Intensive Course on Data Assimilation 19
Evolution of the pdf (with update)
a = 1.2x0 = 50
f)2 = 0.5(2
h)0 = 0.1
NCEO 2 Day Intensive Course on Data Assimilation 20
Evolution of the pdf: cycling
NCEO 2 Day Intensive Course on Data Assimilation 21
• To compute (not just store) the covariance matrix
update , it can be shown that we need ~ n3
Flops (n~107 at the Met Office), i.e. 1021 Flops
(floating-point operations)
• The IBM supercomputer at ECMWF has a maximal
achieved performance of ~100 TFlops / second
~1014 Flops / s. We need ~107 s to compute ,
that is about five months!
• Clearly, we need to resort to some approximations
and/or alternative strategies
Use of Kalman filters in NWP
1
a
k P
1
a
k P
NCEO 2 Day Intensive Course on Data Assimilation 22
Data assimilation techniques for NWP
• The most widespread data assimilation approach used for NWP is based on variational techniques
• 4D-Var calculates the optimal model trajectory over a given time window (typically 6h or 12h)
by taking into account observations taken within the same window and some prior information.
• In the linear case, it is equivalent to a Kalman
smoother initialized with the same prior information used in 4D-Var
• We now discuss an approximation of the Kalman filter based on a set of ensemble members (i.e., based on Monte Carlo techniques)
NCEO 2 Day Intensive Course on Data Assimilation 23
Ensemble methods
• Consider a random variable x distributed according f(x), with unknown mean m and
variance 2
• From f(x) we can generate a ensemble of N independent realizations of x: x1, x2, …, xn.
• We can estimate m and 2 via the sample (or ensemble) mean and sample variance s:
• They are both unbiased estimators and their
variance is
1
1 N
i
i
x xN
2 2
1
1( )
1
N
i
i
s x xN
2
Var( )xN
42 2
Var( )1
sN
x
NCEO 2 Day Intensive Course on Data Assimilation 24
• Gray dots: mean values of N ensemble members drawn from a Gaussian with zero
mean and unit variance, for a total of 100 mean values for each N value
• asterisks: sample mean of 100 means and sample standard deviation of the mean
• Solid line: expected value of the mean (= 0) and (dashed) of the standard deviation
of the mean (= 1/sqrt(n))
Standard deviation of the mean: empirical vs. expected values
NCEO 2 Day Intensive Course on Data Assimilation 25
Ensemble Kalman filter
• Replace xt with , the mean of an ensemble of
forecasts
• Replace Pf and Pa with Pfe and Pa
e calculated
from an ensemble of forecasts (error ~1/N)
where with, e.g.,
• We define
with
and, e.g.,
{( )( ) } ( )( )f f t f t T f f f f T f
i i i i eE P x x x x x x x x P
ax
1 2( , ,..., ,..., )a a a a
i Nx x x x
{( )( ) } ( )( )a a t a t T a a a a T a
i i i i eE P x x x x x x x x P
1( ) ( ( )) ( )f a
i k i k i kt M t t x x η h
i: N (0,Q)
( ) ( )( ( ) ( ))f T f f f f T f T
e i i i iH H P H x x x x P H
( ) ( ( ) ( ))( ( ) ( ))f T f f f f T f T
e i i i iH H H H HP H x x x x HP H
1( ) (( ) ) ( ( ))a f f T f T o f
i i e e i iH x x P H HP H R y x
y
i
o yo
i
i~ N (0,R)
NCEO 2 Day Intensive Course on Data Assimilation 26
EnKF features• Unlike the EKF, the EnKF makes use of the full
nonlinear model M: more accurate determination of error growth (and of error saturation)
• The Kalman gain K is computed without the need to
calculate Pf. This is particularly advantageous when observations are processed serially
• No need to linearize the observation operator H
• Each ensemble member can be propagated forward in time independently: highly parallel algorithm
• Model error term can be included as a perturbation of the deterministic forecast
NCEO 2 Day Intensive Course on Data Assimilation 27
Covariance localization
• When forecast error covariance is misspecified
(e.g., due to neglecting model error, or when N
<< n), it may include spurious correlations
between very distant grid points
• A common solution is to multiply each Pfe
element by an appropriate weight that reduces
long-distance correlations
• This ensures that only the components of Pfe
believed to represent the corresponding
components of Pf accurately are retained
NCEO 2 Day Intensive Course on Data Assimilation 28
Covariance localization: an example
(a) Pfe (N=25)
(b) Pfe (N=100)
(c) Correlation function with compact support
(d) localized Pfe (N=25)
From Fig. 6.4 of Hamill, 2006
NCEO 2 Day Intensive Course on Data Assimilation 29
Other issues• In the extra-tropics at large scale the atmosphere is
in near-geostrophic balance. An inappropropriate
(e.g., too narrow) localization may give rise to
unbalanced increments (e.g., between height
gradient and wind increments)
• Analysis update equations are optimal only when
errors are Gaussian (not necessarily the case)
• Unless model error is properly taken into account,
forecast errors may be underestimated
• Computational expense is proportional to number of
observations: problems when obs are “too many”
NCEO 2 Day Intensive Course on Data Assimilation 30
EnKF variants• To estimate Pa
e correctly, the EnKF algorithm requires the
observations to be treated as random variables: stochastic or
fully-Monte Carlo algorithm
• An alternative strategy consists in requiring Pae to be
consistent with the analysis error covariance prescribed by the
Kalman filter: deterministic algorithms (nonuniqueness).
• An advantage of deterministic algorithms is to avoid
misrepresentation of observation error (due to finite sampling)
statistics
• Deterministic algorithms prescribe the expression of the matrix
of the ensemble member perturbations (from the mean), that
is a square-root (or, more precisely, Cholesky factor) of Pae:
also called square-root algorithms (e.g., EnSRF)
NCEO 2 Day Intensive Course on Data Assimilation 31
Conclusions
• Ensemble-based algorithms are imposing
themselves as viable alternative to the more
established variational techniques for NWP data
assimilation
• Their easy method of implementation makes it
easier to scientists (including PhD students!) to
contribute to research in data assimilation
• They provide a natural framework to generate
probabilistic forecasts, i.e. forecasts with a
quantitative estimate of their errors