local forecast probability distributions by statistical...
TRANSCRIPT
Local forecast of probability Local forecast of probability distributions by statistical distributions by statistical
adaptation adaptation S. FARGES (S. FARGES ([email protected])[email protected])
DPREVI/GCRIDPREVI/GCRI April 05 2012April 05 2012
Introduction
In an atmospheric model, coasts, orography, soil type andvegetation status are not perfectly well represented. In addition forcomputation purposes that space is discretized ; by a way ofconsequence a statistical post-processing is necessary toaccount for the local climatology of the site (station).
This is called statistical adaptation of numerical model outputs(SA). Meteo-France currently produces approximately 3.5 billionforecasts using SA per day (for a station, a parameter and timerange given).
Uncertainty is present since the start of the forecast covered timerange. We want to increase the use of a probabilisticapproach, including, if possible, by forecasting the distributions ofthe parameters. We have several methods available to us, aswe shall see...
Forecast of a probabilistic distribution : different ways
1. Dynamic approach : the ensemble forecast
I The ensemblist systems (varEPS, NCEP, PEARP...) :I Advantages : realizations of trajectories and maps, extreme
phenomena forecasting (if enough members).I Disadvantage : cost (⇒ limited resolution).
I The multi-models systems (IFS + ARPEGE + ...) :I Advantages : realizations of trajectories and maps, quality of
the deterministic forecast in the short time range.I Disadvantages : more time to collect the data from the various
producers and limited number of ensemble members.
I The multi-ensemblists systems (TIGGE) :I Advantages and disadvantages of previously described
approaches.
Often require post-processing statistics (because the producedprobabilities are often unreliable) : Ensemble Dressing, BayesianModel Averaging (BMA), Nonhomogenous Gaussian Regression(NGR) or Ensemble Regression.
Forecast of a probabilistic distribution : different ways
2. Statistical approach : probabilistic statistical adaptation ofan atmospheric model
I Discrimination models (LDA, logistic regression, neuralnetworks...) :
I Advantages : cost and robustness of linear methods.I Disadvantage : production of occurrence probabilities but no
production of distribution.
I Generalized Linear Models :I Advantages : cost and possibility to use the underlying
probability distributions.I Disadvantages : limited number of supported distributions
(exponential family) not always perfectly calibrated afterfitting.
Forecast of a probabilistic distribution : different ways
Proposed new approach : the generalized linear regression byGaussian anamorphosis
I Advantages : cost, robustness (linear model), no constraint onprobabilistic distributions and strongly calibrated (in theory).Can be used for the postprocessing of an ensemble forecastcoupled with a BMA.
I Disadvantage : generally no direct computable formulation ofdeterministic forecast (expectation).
Remark : statistical methods have the disadvantage of not beingable to easily produce realistic realizations of trajectories or map.
Generalized linear regression by Gaussian anamorphosis
Φ(y) = Ψ−1 ◦ PY (y)
Φ(y)
Φ−1(z)
Ψ(z) PY (y)[z |z] [y |y ]
1. Methodology (beta distribution for humidity)
Normalized space (z) Initial space (y)
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
CDF of normalized humidity forecast
Normalized humidity
Pro
babi
lity
−1 0 1 2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
PDF of normalized humidity forecast
Normalized humidity
Den
sity
z =P∑
p=1
apxp (εz = z − z)
z |z ∼ N (z , σεz)
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
CDF of humidity observation
Humidity (%)
Pro
babi
lity
0 20 40 60 80 100
0.00
0.01
0.02
0.03
0.04
PDF of humidity forecast
Humidity (%)
Den
sity
y =
∫ +∞
−∞Φ−1(z)ϕ(z , z , σεz
)dz
[y |y ] = [y ]ϕ(Φ(y), z , σεz
)
ϕ(Φ(y))
Generalized linear regression by Gaussian anamorphosis
σεz|z
[z]
[z]
[y ]
[y ]
σεy|y
Normalized space (z) Initial space (y)
−3 −2 −1 0 1 2 3
0.4
0.5
0.6
0.7
0.8
Evolution of the conditional RMSE to normalized humidity forecast
Normalized humidity forecast
RM
SE
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
0.5
PDF of forecasted and observed normalized humidity
Normalized humidity
Den
sity
σ2z = σ2
z + σ2εz
corr(z , εz) = 0 et E(εz |z) = 0
I Independance between zand εz
I Homoscedasticity
I Probabilistic and marginalcalibration of [z |z]
0 20 40 60 80 100
02
46
810
Evolution of the conditional RMSE to humidity forecast
Humidity forecast (%)
RM
SE
(%
)
0 20 40 60 80 100
0.00
00.
005
0.01
00.
015
0.02
00.
025
0.03
0
PDF of forecasted and observed humidity
Humidity (%)
Den
sity
σ2y = σ2
y + σ2εy
corr(y , εy ) = 0 et E(εy |y) = 0
I Only linear independancebetween y and εy
I No homoscedasticity
I Probabilistic and marginalcalibration of [y |y ]
Generalized linear regression by Gaussian anamorphosis
- Record value[yini ]
[ycor ]
3. Application examplesUsing a truncated normal distribution for a better forecasting ofextreme temperatures (theoretical simulation)
Theoretical gains made on RMSE
Initial RMSE (°C)1.5 2.0 2.5 3.0 3.5 4.0 4.5
0.08
0.10
0.12
0.14
0.16
RMSE conditional forecast correctedInitial RMSE = 2°C
Temperature (°C)0 5 10 15 20 25 30
0.5
1.0
1.5
2.0
Example of distribution corrected
Temperature (°C)26 28 30 32
0.0
0.1
0.2
0.3
0.4
Generalized linear regression by Gaussian anamorphosis
- New- approach
- Classical- approach
• New approach• is better
• Classical approach• is better
Forecast of the cloudiness : comparison of the qualities of themodel against those of logistic regressions (for probabilitiesforecast) and linear regression (for deterministic forecast)
Probability of the hypothesis reliability
Total cloudiness
0.0
0.2
0.4
0.6
0.8
1.0
0−0 1−2 2−4 4−7 8−8
Probability that the ROC AUC are different
Total cloudiness
0.0
0.1
0.2
0.3
0.4
0.5
0−0 1−2 2−4 4−7 8−8
Probability that the RMSE are different
Station number
0.00
0.05
0.10
0.15
0.20
1 3 5 7 9 11 13 15
Probabilistic mixed statistical adaptation
This is a statistically post-processed multi-model system.Advantage for the deterministic forecast for the short time range :
RMSE of temperature forecasts mixed versus its components (average of 25 european stations). Calculated results for the period 09/2011 to 01/2012.
Time range
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.20 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102
MOS−mixedMOS−IFS D−1 12HMOS−ARPEGE D 00H
Probabilistic mixed statistical adaptation
The goal of the probabilistic mixed SA is to fit a linear regressionmodel to each model of an ensemble set, using a BMA.
We have K models Mk . z distribution forms (in the normalizedspace) given by the BMA :
p(z |(Mk)k=1,··· ,K ,Θ, zT ) =K∑
k=1
αkp(z |Mk , θk , zT )
z |Mk , θk , zT ∼ N (
q∑i=1
aki xk
i , σk)
zT = (z1, · · · , zn), θk = ((aki )i=1,··· ,q, σk) et Θ = ((αk , θk)k=1,··· ,K )
We estimate the Θ parameters using the maximum likelihood withthe EM algorithm.
Step E (expectation) : we estimate the probabilities p(Mk |Θ, zT )
p(Mk |Θg , zT ) =αg−1
k p(zT |Mk , θg−1k )∑K
l=1 αg−1l p(zT |Ml , θ
g−1l )
Probabilistic mixed statistical adaptation
Step M (maximisation) : parameters Θ are iterativaly estimated
αgk =
1
n
n∑j=1
p(Mk |Θg , zj)
(aki )gi=1,··· ,q = (X
′kPg
k Xk)−1X′kPg
k zT
Xk matrix of model predictors k
Pgk =
p(Mk |Θg , z1) 0 · · · 0
0 p(Mk |Θg , z2) · · · 0...
.... . .
...0 0 · · · p(Mk |Θg , zn)
σgk =
√√√√ 1
nαgk
n∑j=1
p(Mk |Θg , zj)(zkgj − zj)
2
Conclusion
I Among the various ways to produce probability distributions,the generalized linear regression by Gaussian anamorphosis hasthe advantage of being inexpensive, robust, to overcomediscrimination techniques requiring a statistical model byclass, and to issue reliable probabilistic forecasts (if theobservations distribution is adjusted properly and the numberof predictors sufficiently high).
I The deterministic forecast produced by this model has goodproperties and may be better than the one obtained by alinear regression.
I Coupled with BMA, the Gaussian anamorphosis can be usedto improve the quality of an ensemble forecast. This is astatistical-dynamical system particularly promising.
Prospects : probabilistic approach and spatialization
I Multi-parameters probabilistic mixed SA (course inprogress).
I Probabilistic spatialization of SA on a regular grid. It canbe facilitated in a normalized space by anamorphosis. Idea of aprocedure :
1. Normalization of all the explanatories variables beforeregression.
2. Report of the regressions based on a spatial classification ofthe parameter analyzed by AROME model.
3. Evaluation of the observations distributions on the grid basedon the observed data of the stations and analyzed by AROME(suggestion : interpolation of the Hermite’s polynomialscoefficients with adjustement cubic splines).
I Exploiting the forecaster’s expertise in probabilisticforecasting, especially in the case of events strongly bi-modal(effect of the presence or not of low clouds on thetemperature).
The endThe end