local forecast probability distributions by statistical...

Local forecast of probability Local forecast of probability distributions by statistical distributions by statistical

adaptation adaptation S. FARGES (S. FARGES ([email protected])[email protected])

DPREVI/GCRIDPREVI/GCRI April 05 2012April 05 2012

Introduction

In an atmospheric model, coasts, orography, soil type andvegetation status are not perfectly well represented. In addition forcomputation purposes that space is discretized ; by a way ofconsequence a statistical post-processing is necessary toaccount for the local climatology of the site (station).

This is called statistical adaptation of numerical model outputs(SA). Meteo-France currently produces approximately 3.5 billionforecasts using SA per day (for a station, a parameter and timerange given).

Uncertainty is present since the start of the forecast covered timerange. We want to increase the use of a probabilisticapproach, including, if possible, by forecasting the distributions ofthe parameters. We have several methods available to us, aswe shall see...

Forecast of a probabilistic distribution : different ways

1. Dynamic approach : the ensemble forecast

I The ensemblist systems (varEPS, NCEP, PEARP...) :I Advantages : realizations of trajectories and maps, extreme

phenomena forecasting (if enough members).I Disadvantage : cost (⇒ limited resolution).

I The multi-models systems (IFS + ARPEGE + ...) :I Advantages : realizations of trajectories and maps, quality of

the deterministic forecast in the short time range.I Disadvantages : more time to collect the data from the various

producers and limited number of ensemble members.

I The multi-ensemblists systems (TIGGE) :I Advantages and disadvantages of previously described

approaches.

Often require post-processing statistics (because the producedprobabilities are often unreliable) : Ensemble Dressing, BayesianModel Averaging (BMA), Nonhomogenous Gaussian Regression(NGR) or Ensemble Regression.


2. Statistical approach : probabilistic statistical adaptation ofan atmospheric model

I Discrimination models (LDA, logistic regression, neuralnetworks...) :

I Advantages : cost and robustness of linear methods.I Disadvantage : production of occurrence probabilities but no

production of distribution.

I Generalized Linear Models :I Advantages : cost and possibility to use the underlying

probability distributions.I Disadvantages : limited number of supported distributions

(exponential family) not always perfectly calibrated afterfitting.


Proposed new approach : the generalized linear regression byGaussian anamorphosis

I Advantages : cost, robustness (linear model), no constraint onprobabilistic distributions and strongly calibrated (in theory).Can be used for the postprocessing of an ensemble forecastcoupled with a BMA.

I Disadvantage : generally no direct computable formulation ofdeterministic forecast (expectation).

Remark : statistical methods have the disadvantage of not beingable to easily produce realistic realizations of trajectories or map.

Generalized linear regression by Gaussian anamorphosis

Φ(y) = Ψ−1 ◦ PY (y)

Φ(y)

Φ−1(z)

Ψ(z) PY (y)[z |z] [y |y ]

1. Methodology (beta distribution for humidity)

Normalized space (z) Initial space (y)

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

CDF of normalized humidity forecast

Normalized humidity

Pro

babi

lity

−1 0 1 2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

PDF of normalized humidity forecast

Normalized humidity

Den

sity

z =P∑

p=1

apxp (εz = z − z)

z |z ∼ N (z , σεz)

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

CDF of humidity observation

Humidity (%)

Pro

babi

lity

0 20 40 60 80 100

0.00

0.01

0.02

0.03

0.04

PDF of humidity forecast

Humidity (%)

Den

sity

y =

∫ +∞

−∞Φ−1(z)ϕ(z , z , σεz

)dz

[y |y ] = [y ]ϕ(Φ(y), z , σεz

)

ϕ(Φ(y))


σεz|z

[z]

[z]

[y ]

[y ]

σεy|y

Normalized space (z) Initial space (y)

−3 −2 −1 0 1 2 3

0.4

0.5

0.6

0.7

0.8

Evolution of the conditional RMSE to normalized humidity forecast

Normalized humidity forecast

RM

SE

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

PDF of forecasted and observed normalized humidity

Normalized humidity

Den

sity

σ2z = σ2

z + σ2εz

corr(z , εz) = 0 et E(εz |z) = 0

I Independance between zand εz

I Homoscedasticity

I Probabilistic and marginalcalibration of [z |z]

0 20 40 60 80 100

02

46

810

Evolution of the conditional RMSE to humidity forecast

Humidity forecast (%)

RM

SE

(%

)

0 20 40 60 80 100

0.00

00.

005

0.01

00.

015

0.02

00.

025

0.03

0

PDF of forecasted and observed humidity

Humidity (%)

Den

sity

σ2y = σ2

y + σ2εy

corr(y , εy ) = 0 et E(εy |y) = 0

I Only linear independancebetween y and εy

I No homoscedasticity

I Probabilistic and marginalcalibration of [y |y ]


- Record value[yini ]

[ycor ]

3. Application examplesUsing a truncated normal distribution for a better forecasting ofextreme temperatures (theoretical simulation)

Theoretical gains made on RMSE

Initial RMSE (°C)1.5 2.0 2.5 3.0 3.5 4.0 4.5

0.08

0.10

0.12

0.14

0.16

RMSE conditional forecast correctedInitial RMSE = 2°C

Temperature (°C)0 5 10 15 20 25 30

0.5

1.0

1.5

2.0

Example of distribution corrected

Temperature (°C)26 28 30 32

0.0

0.1

0.2

0.3

0.4


- New- approach

- Classical- approach

• New approach• is better

• Classical approach• is better

Forecast of the cloudiness : comparison of the qualities of themodel against those of logistic regressions (for probabilitiesforecast) and linear regression (for deterministic forecast)

Probability of the hypothesis reliability

Total cloudiness

0.0

0.2

0.4

0.6

0.8

1.0

0−0 1−2 2−4 4−7 8−8

Probability that the ROC AUC are different

Total cloudiness

0.0

0.1

0.2

0.3

0.4

0.5

0−0 1−2 2−4 4−7 8−8

Probability that the RMSE are different

Station number

0.00

0.05

0.10

0.15

0.20

1 3 5 7 9 11 13 15

Probabilistic mixed statistical adaptation

This is a statistically post-processed multi-model system.Advantage for the deterministic forecast for the short time range :

RMSE of temperature forecasts mixed versus its components (average of 25 european stations). Calculated results for the period 09/2011 to 01/2012.

Time range

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.20 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102

MOS−mixedMOS−IFS D−1 12HMOS−ARPEGE D 00H


The goal of the probabilistic mixed SA is to fit a linear regressionmodel to each model of an ensemble set, using a BMA.

We have K models Mk . z distribution forms (in the normalizedspace) given by the BMA :

p(z |(Mk)k=1,··· ,K ,Θ, zT ) =K∑

k=1

αkp(z |Mk , θk , zT )

z |Mk , θk , zT ∼ N (

q∑i=1

aki xk

i , σk)

zT = (z1, · · · , zn), θk = ((aki )i=1,··· ,q, σk) et Θ = ((αk , θk)k=1,··· ,K )

We estimate the Θ parameters using the maximum likelihood withthe EM algorithm.

Step E (expectation) : we estimate the probabilities p(Mk |Θ, zT )

p(Mk |Θg , zT ) =αg−1

k p(zT |Mk , θg−1k )∑K

l=1 αg−1l p(zT |Ml , θ

g−1l )


Step M (maximisation) : parameters Θ are iterativaly estimated

αgk =

1

n

n∑j=1

p(Mk |Θg , zj)

(aki )gi=1,··· ,q = (X

′kPg

k Xk)−1X′kPg

k zT

Xk matrix of model predictors k

Pgk =

p(Mk |Θg , z1) 0 · · · 0

0 p(Mk |Θg , z2) · · · 0...

.... . .

...0 0 · · · p(Mk |Θg , zn)

σgk =

√√√√ 1

nαgk

n∑j=1

p(Mk |Θg , zj)(zkgj − zj)

2

Conclusion

I Among the various ways to produce probability distributions,the generalized linear regression by Gaussian anamorphosis hasthe advantage of being inexpensive, robust, to overcomediscrimination techniques requiring a statistical model byclass, and to issue reliable probabilistic forecasts (if theobservations distribution is adjusted properly and the numberof predictors sufficiently high).

I The deterministic forecast produced by this model has goodproperties and may be better than the one obtained by alinear regression.

I Coupled with BMA, the Gaussian anamorphosis can be usedto improve the quality of an ensemble forecast. This is astatistical-dynamical system particularly promising.

Prospects : probabilistic approach and spatialization

I Multi-parameters probabilistic mixed SA (course inprogress).

I Probabilistic spatialization of SA on a regular grid. It canbe facilitated in a normalized space by anamorphosis. Idea of aprocedure :

1. Normalization of all the explanatories variables beforeregression.

2. Report of the regressions based on a spatial classification ofthe parameter analyzed by AROME model.

3. Evaluation of the observations distributions on the grid basedon the observed data of the stations and analyzed by AROME(suggestion : interpolation of the Hermite’s polynomialscoefficients with adjustement cubic splines).

I Exploiting the forecaster’s expertise in probabilisticforecasting, especially in the case of events strongly bi-modal(effect of the presence or not of low clouds on thetemperature).

The endThe end

local forecast probability distributions by statistical...

Documents