nachiketa acharya - india meteorological...
TRANSCRIPT
Seasonal Forecasting Using the Climate Predictability Tool
Validation& Verification in CPT Nachiketa Acharya [email protected]
Big Thanks to Dr. Simon Mason
2 Seasonal Forecasting Using the Climate Predictability Tool
Validation vs Verification
• “Validation” v “verification”: we validate a model, but verify forecasts.
• In CPT, “validation” relates to the assessment of a model for deterministic (“best guess”) cross-validated and retroactive predictions; “verification” relates to the assessment of probabilistic forecasts.
3 Seasonal Forecasting Using the Climate Predictability Tool
Cross-validation Leave-one-out cross-validation
1971 Predict 1971
Training period
1972 Training period
Predict 1972
Training Period
1973 Training period
Predict 1973
Training period
1974 Training period
Predict 1974
Training Period
1975 Training period
Predict 1975
Training period
… repeat to 2010.
Leave-k-out cross-validation
1971 Predict 1971
Omit 1972
Omit 1973
Training period
1972 Omit 1971
Predict 1972
Omit 1973
Omit 1974
Training period
1973 Omit 1971
Omit 1972
Predict 1973
Omit 1974
Omit 1975
Training period
1974 Training period
Omit 1972
Omit 1973
Predict 1974
Omit 1975
Omit 1976
Training period
1975 Training period
Omit 1973
Omit 1974
Predict 1975
Omit 1976
Omit 1977
Data period (24 year)
1982 2005
1982 2005
Cross validation manner (leave- one –out method)
5 Seasonal Forecasting Using the Climate Predictability Tool
Retroactive forecasting
Given data for 1951-2000, it is possible to calculate a retroactive set of probabilistic forecasts. CPT will use an initial training period to cross-validate a model and make predictions for the subsequent year(s), then update the training period and predict additional years, repeating until all possible years have been predicted.
1981 Training period
(1951-1980) Predict 1981
Omit 1982+
1982 Training period
(1951-1981) Predict 1982
Omit 1983+
1983 Training period
(1951-1982) Predict 1983
Omit 1984+
1984 Training period
(1951-1983) Predict 1984
Omit 1985+
1985 Training period
(1951-1984) Predict 1985
6 Seasonal Forecasting Using the Climate Predictability Tool
Forecasts and observations
Discrete Continuous
Deterministic It will rain tomorrow
There will be 10 mm of rain tomorrow
Probabilistic There is a 50% chance of rain
tomorrow
There is a p% chance of more than k mm of rain tomorrow
7 Seasonal Forecasting Using the Climate Predictability Tool
Forecasts and observations
Discrete Continuous
Deterministic It will rain tomorrow
There will be 10 mm of rain tomorrow
Probabilistic There is a 50% chance of rain
tomorrow
There is a p% chance of more than k mm of rain tomorrow
8 Seasonal Forecasting Using the Climate Predictability Tool
Continuous measures compare the best-guess forecasts with the observed values without regard to the categories. They compare forecasts in mm or °C against observations in mm or °C. Tools ~ Validation ~ Cross-validated ~ Performance measures
9 Seasonal Forecasting Using the Climate Predictability Tool
Pearson’s correlation
Pearson’s correlation measures association (are increases and decreases in the forecasts associated with increases and decreases in the observations?).
It does not measure accuracy.
When squared, it tells us how much of the variance of the observations is correctly forecast.
2 2
n
i ii
n n
i ii i
x x y yr
x x y y
Correlation: Measuring the strength of
Linear relationship between two variables
Correlation between two variables Pearson product-moment correlation
Correlation is a systematic relationship between x and y: When one goes up, the other tends to go up also, or may tend to go down. Need corresponding pairs of cases of x, y. “Perfect” positive correlation is +1 “Perfect” negative correlation is –1 No correlation (x and y completely unrelated) is 0 Correlation can be anywhere between –1 and +1. A relationship between x and y may or may not be causal – if not, x and y may be under control of some third variable. Correlation can be estimated visually by looking at a scatterplot of dots on an x vs. y graph.
| | | o | | o o | | o o o | | o | | o o | Y| o o o o | | o | | o o | | o o | | o | | o | | o | |_______________________________________________| X correlation = 0.8
| o | | o | | o o o | | o o o o | | o o o | | o o | Y| o o o o | | o o o | | o o o | | o o | | o | | o o o | | o | |_______________________________________________| X correlation = 0.55
| | | | Y | | | | | o | | o o | | o o | | o o | | o o | |______o_______________________o____|
X correlation = 0
there is a strong nonlinear relationship The Pearson correlation only detects linear relationships
| o | | | Y | | | | | | | | | o | |o o o | |oooooo o o | |oooooo_o_________________________________|
X correlation = 0.87 (due to one outlier in upper right) If domination by one case is not desired, can use the Spearman rank correlation (correlation among ranks instead of actual values).
15 Seasonal Forecasting Using the Climate Predictability Tool
Spearman’s correlation
Numerator: ?
Denominator: ?
How much of the squared variance of the ranks for the observations can we correctly forecast?
Huh?
Spearman’s correlation does not have as obvious an interpretation as Pearson’s, but it is much less sensitive to extremes.
2
1
2
6
11
i i
n
x yi
r r
n n
Spearman rank correlation Rank correlation is the Pearson correlation between the ranks of X vs. the ranks of Y, treating ranks as numbers. Rank correlation measures the strength of monotonic relationship between two variables. Rank correlation defuses outliers by not honoring original intervals between adjacent ranks. Adjacent ranks simply differ by 1. Simpler formula for rank correlation for small samples: If difference in rank for a given case is D,
Spearman cor = 1 -
If ranks identical for all cases, all D are zero and cor = 1. An example of the use of this formula is given in next slide.
n
i iDNN 1
22 )1(
6
Spearman rank correlation
Rank correlation is simply the correlation between the ranks of X vs. the ranks of Y, treating ranks as numbers. When there are outliers, or when the X and/or Y data are very much non-normal, the Spearman rank correlation should be computed in addition to the standard correlation.
Example of conversion to ranks for X or for Y: Original numbers: 2 9 189 3 21 7 Corresponding ranks: 6 3 1 5 2 4 can also be 1 4 6 2 5 3 Note in above example that the difference between 189 and 21 is treated as the same as that between 9 and 7.
18 Seasonal Forecasting Using the Climate Predictability Tool
2 AFC (Kendall’s tau)
Denominator: total number of pairs.
Numerator: difference in the numbers of concordant and discordant pairs.
Kendall’s correlation measures discrimination (do the forecasts increase and decrease as the observations increase and decrease?). It can be transformed to the probability that the forecasts successfully distinguish the wetter (or hotter) of two observations?
12 1
c dn n
n n
19 Seasonal Forecasting Using the Climate Predictability Tool
Error measures compare the best-guess forecasts with the observed values without regard to the categories. They compare forecasts in mm or °C against observations in mm or °C.
20 Seasonal Forecasting Using the Climate Predictability Tool
Biases Mean bias:
Always close to zero for cross-validated forecasts;
Slightly negative if predictand data are positively skewed.
Indicates ability to forecast shifts in climate for retroactive forecasts.
Variance or amplitude bias:
Typically very small if skill is low because forecasts always close to the mean
If there is no mean or variance bias, the RMSE of the forecasts will exceed that of climatology if the correlation is less than 0.5.
Root-mean-Square Skill Score: RMSSS for continuous deterministic forecasts RMSSS is defined as: where: RMSEf = root mean square error of forecasts, and RMSEs = root mean square error of standard used as no-skill baseline. Both persistence and climatology can be used as baseline. Persistence, for a given parameter, is the persisted anomaly from the forecast period immediately prior to the LRF period being verified. For example, for seasonal forecasts, persistence is the seasonal anomaly from the season period prior to the season being verified. Climatology is equivalent to persisting an anomaly of zero.
RMSf =
22 Seasonal Forecasting Using the Climate Predictability Tool
Forecasts and observations
Discrete Continuous
Deterministic It will rain tomorrow
There will be 10 mm of rain tomorrow
Probabilistic There is a 50% chance of rain
tomorrow
There is a p% chance of more than k mm of rain tomorrow
23 Seasonal Forecasting Using the Climate Predictability Tool
Categorical measures measure the skill of the deterministic forecasts with the observations as categories. Some compare forecasts in mm or °C with observations as categories, others compare categories with categories.
24 Seasonal Forecasting Using the Climate Predictability Tool
Hit scores convert the forecasts to categories and then compare these with the observed categories. But note that the category containing the best guess is not necessarily the most likely!
25 Seasonal Forecasting Using the Climate Predictability Tool
Hit scores
The contingency tables are based on cross-validated definitions of the categories and so may not perfectly match implied scores from the graph.
Some hits can be expected even with useless forecasts (e.g., guessing, or always forecasting the same outcome…
Tools ~ Contingency Tables ~ Cross-validated
28 Seasonal Forecasting Using the Climate Predictability Tool
Measures of discrimination: can the forecasts successfully distinguish different outcomes? The observations are categories, but the forecasts are continuous (except where indicated).
29 Seasonal Forecasting Using the Climate Predictability Tool
ROC diagrams
ROC areas: do we issue a higher probability when the category occurs?
Graph bottom left: when the probabilities are high, does the category occur?
Graph top right: when the probabilities are low, does the category not occur?
Retroactive forecasts of MAM 1986 – 2010 Thailand rainfall using February Pacific SSTs
30 Seasonal Forecasting Using the Climate Predictability Tool
Relative Operating Characteristics
31 Seasonal Forecasting Using the Climate Predictability Tool
Continuous scores
Correlations
Pearson’s: % variance
Spearman’s: % variance of ranks
Kendall’s: 2AFC – probability of successfully identifying warmer / wetter observation
Errors
Mean bias: unconditional error
Variance bias: underestimation of variability
RMSE: correlation, mean and variance bias
MAE: average error
32 Seasonal Forecasting Using the Climate Predictability Tool
Categorical scores
Hits
Hit score: % correct
Hit skill: % correct adjusted for guessing
LEPS: adjusts for near-misses
Gerrity: adjusts for near-misses
Discrimination
2AFC: probability of successfully identifying warmer / wetter category
ROC: probability of successfully identifying observation in current category
33 Seasonal Forecasting Using the Climate Predictability Tool
Significance testing
Tools ~ Validation ~ Cross-validated ~ Bootstrap
34 Seasonal Forecasting Using the Climate Predictability Tool
Probabilistic Forecasts
Why do we issue forecasts probabilistically?
• We cannot be certain what is going to happen
• The probabilities try to give an indication of how confident we are that the specified outcome will occur.
35 Seasonal Forecasting Using the Climate Predictability Tool
Verification of probabilistic forecasts
Attributes Diagrams: graphs reliability, resolution, sharpness ROC Diagrams: graphs showing discrimination Scores: a table of scores for probabilistic forecasts Skill Maps: maps of scores for probabilistic forecasts Tendency Diagram: graphs showing unconditional biases Ranked Hits Diagram: graphs showing frequencies of observed
categories having the highest probability Weather Roulette: graphs showing estimates of forecast value
36 Seasonal Forecasting Using the Climate Predictability Tool
What makes a “good” probabilistic forecast?
Reliability the event occurs as frequently as implied by the forecast
Sharpness the forecasts frequently have probabilities that differ from climatology considerably
Resolution the outcome differs when the forecast differs
Discrimination the forecasts differ when the outcome differs
36
37 Seasonal Forecasting Using the Climate Predictability Tool
Attributes diagrams
The histograms show the sharpness.
The vertical and horizontal lines show the observed climatology and indicate the forecast bias.
The diagonal lines show reliability and “skill”.
The coloured line shows the reliability and resolution of the forecasts.
The dashed line shows a smoothed fit.
38 Seasonal Forecasting Using the Climate Predictability Tool
Probabilistic scores
Scores per category
Brier score: mean squared error in probability (assuming that the probability should be 100% if the category occurs and 0% if it does not occur)
Brier skill score: % improvement over Brier score using climatology forecasts (often pessimistic because of strict requirement for reliability)
ROC area: probability of successfully discriminating the category (i.e., how frequently the forecast probability for that category is higher when it occurs than when it does not occur)
Resolution slope: % increase in frequency for each 1% increase in forecast probability
39 Seasonal Forecasting Using the Climate Predictability Tool
Probabilistic scores
Overall scores
Ranked prob score: mean squared error in cumulative probabilities
RPSS: % improvement over RPS using climatology forecasts (often pessimistic because of strict requirement for reliability)
2AFC score: probability of successfully discriminating the wetter or warmer category
Resolution slope: % increase in frequency for each 1% increase in forecast probability
Effective interest: % return given fair odds
Linear prob score: average probability on the category that occurs
Hit score (rank n): how often the category with the nth highest probability occurs
Verification of Probabilistic Categorical Forecasts: The Ranked Probability Skill Score (RPSS)
Epstein (1969), J. Appl. Meteor.
RPSS measures cumulative squared error between categorical forecast probabilities and the observed categorical probabilities relative to a reference (or standard baseline) forecast. The observed categorical probabilities are 100% in the observed category, and 0% in all other categories.
2( ) ( )
1
( )Ncat
F cat O catcat
RPS Pcum Pcum
Where Ncat = 3 for tercile forecasts. The “cum” implies that the sum- mation is done for cat 1, then cat 1 and 2, then cat 1 and 2 and 3.
2( ) ( )
1
( )Ncat
F cat O catcat
RPS Pcum Pcum
The higher the RPS, the poorer the forecast. RPS=0 means that the probability was 100% given to the category that was observed. The RPSS is the RPS for the forecast compared to the RPS for a reference forecast such as one that gives climatological probabilities.
1 forecast
reference
RPSRPSS
RPS
RPSS > 0 when RPS for actual forecast is smaller than RPS for the reference forecast.
42 Seasonal Forecasting Using the Climate Predictability Tool
What is “skill”?
42