verification of probabilistic streamflow forecasts - iihr – hydroscience … · 2013-06-14 ·...
TRANSCRIPT
VERIFICATION OFPROBABILISTIC STREAMFLOW
FORECASTS
by
Tempei Hashino, A. Allen Bradley, and Stuart. S. Schwartz
Sponsored by
National Oceanic and Atmospheric Administration (NOAA)No. NA86GP0365 and No. NA16GP1569
IIHR Report No. 427
IIHR-Hydroscience & Engineeringand Department of Civil and Environmental Engineering
The University of IowaIowa City IA 52242-1585
August 2002
ACKNOWLEDGMENTS
This report basically constitutes the master thesis of Tempei Hashino. Fund-
ing for the research was provided by National Oceanic and Atmospheric Adminis-
tration (NOAA) under the following grants: #NA86GP0365 and #NA16GP 1569.
This support is gratefully acknowledged.
i
EXECTIVE SUMMARY
Long-range streamflow forecasts, such as the ensemble streamflow predictions
(ESP) produced by the National Weather Service (NWS) Advanced Hydrologic
Prediction Services (AHPS), are usually probabilistic forecasts. The format of the
forecast is essentially a continuous probability distribution function, which predicts
the likelihood of occurence of a streamflow variable, conditioned on the current
hydroclimatic state. Although significant advances in forecast verification method-
ologies have been made in recent years, many of these approaches are not directly
applicable to probabilistic streamflow forecasts. The main purposes of this research
are (1) to extend the distributions-oriented (DO) approach to the verification of
probability distribution forecasts of streamflow, and (2) to demonstrate the useful-
ness of the DO approach in assessing the quality of streamflow forecasts. Techniques
for forecast verification using the DO approach are proposed and studied using prob-
ability distribution forecasts for an experimental forecasting system for the Upper
Des Moines River basin.
One significant obstacle in the verification of probabilistic streamflow fore-
casts is the small data sample available for verification. Verification sample sizes
for long-range hydrologic forecasts are typically much smaller than those available
for weather forecasts. Since verification with the DO approach is equivalent to es-
timation of the joint distribution of forecasts and observations, application of the
DO approach to streamflow forecasts with small samples results in large estimation
uncertainties. Three continuous statistical modeling approaches are considered that
deal with estimation uncertainties by reducing the dimensionality D of the verifi-
cation problem. Based on Monte Carlo experiments, the continuous approach with
a logistic regression or kernel density estimation produces better estimates of fore-
cast quality, especially with small sample sizes (say 50 or 100), than the traditional
discrete approach with a contingency table. Moreover, the continuous approaches
work better whether the forecasts were issued in discrete or continuous numbers.
A significant concern when using the ESP technique for streamflow forecasting
is hydrologic model biases. The simulation biases of the hydrologic model propa-
gate to the probability distribution forecasts through the ensemble traces produced
ii
by the hydrological model, and could degrade the quality of the forecasts. Bias
correction methods are often applied to try to reduce the effects of model biases.
The impacts of three bias correction methods on streamflow forecast quality are
examined using the DO techniques developed for streamflow forecast verification.
The three bias correction methods examined are the Event-Bias Correction method
(EBC), the reression-type method, and the Quantile-mapping method (QM). The
results showed that all bias correction methods improve skill scores, mostly by re-
ducing the conditional bias (Reliability) and unconditional bias (Mean Error). It is
remarkable that in some cases the bias correction methods also improve the associa-
tion (potential skill) between forecasts and observations. The forecasts modified by
EBC tend to have the lowest sharpness and discrimination over all flow quantiles,
whereas QM tends to give the highest sharpness and discrimination. The regression-
type methods seem to be in between of these. This application shows a strength
of the proposed DO approach for probabilistic streamflow verification. Specifically,
the approach produces detailed information on many aspects of forecast quality,
which helps in determining the differences between alternate forecasting systems.
iii
TABLE OF CONTENTS
Page
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
CHAPTER
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 FORECASTING SYSTEM . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Study Area and Data Resources . . . . . . . . . . . . . . . . 32.2 Probabilistic Forecasting System . . . . . . . . . . . . . . . . 42.3 Proposed Verification Approach . . . . . . . . . . . . . . . . 5
2.3.1 Forecasts for a Discrete Event . . . . . . . . . . . . . 52.3.2 Verification Dataset . . . . . . . . . . . . . . . . . . . 7
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 9
3 VERIFICATION APPROACH . . . . . . . . . . . . . . . . . . . . 12
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Distributions-Oriented Measures . . . . . . . . . . . . . . . . 14
3.2.1 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 153.2.3 Calibration-Refinement Measures . . . . . . . . . . . . 153.2.4 Likelihood-Base Rate Measures . . . . . . . . . . . . . 16
3.3 Estimation of Measures . . . . . . . . . . . . . . . . . . . . . 173.3.1 Basic Statistics . . . . . . . . . . . . . . . . . . . . . . 183.3.2 Other Derivative Estimators . . . . . . . . . . . . . . 193.3.3 Estimation of CR Decompositions . . . . . . . . . . . 21
3.4 Example of Verification . . . . . . . . . . . . . . . . . . . . . 233.4.1 Absolute and Relative Measures . . . . . . . . . . . . 233.4.2 Marginal and Conditional Distributions . . . . . . . . 26
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.6 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 30
4 DISTRIBUTIONS-ORIENTED METHODSFOR SMALL VERIFICATION DATASET . . . . . . . . . . . . . 32
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2 Monte Carlo Simulation with
Analytical Model for Joint Distribution . . . . . . . . . . . . 344.2.1 Assumptions and Procedure . . . . . . . . . . . . . . 34
iv
4.2.2 Result and Discussion . . . . . . . . . . . . . . . . . . 364.3 Monte Carlo Simulation with
Stochastic Model of Streamflow Forecasting System . . . . . 504.3.1 Assumptions and Procedure . . . . . . . . . . . . . . 504.3.2 Result and Discussion . . . . . . . . . . . . . . . . . . 55
4.4 Monte Carlo Simulation withDiscrete Joint Distribution Model . . . . . . . . . . . . . . . 604.4.1 Assumptions and Procedure . . . . . . . . . . . . . . 604.4.2 Result and Discussion . . . . . . . . . . . . . . . . . . 64
4.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 68
5 ASSESSMENT OF BIAS CORRECTION METHODSFOR ENSEMBLE FORECASTS . . . . . . . . . . . . . . . . . . . 70
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.2 Biases in Historical Simulations . . . . . . . . . . . . . . . . . 725.3 Bias Correction Methods . . . . . . . . . . . . . . . . . . . . 73
5.3.1 Event-Bias Correction Method . . . . . . . . . . . . . 755.3.2 Regression-Type Method . . . . . . . . . . . . . . . . 755.3.3 Quantile-Mapping Method . . . . . . . . . . . . . . . 78
5.4 Result and Discussion . . . . . . . . . . . . . . . . . . . . . . 785.4.1 Performance Measures . . . . . . . . . . . . . . . . . . 785.4.2 CR Factorization and Decompositions . . . . . . . . . 855.4.3 LBR Factorization and Decompositions . . . . . . . . 885.4.4 Results for All Months . . . . . . . . . . . . . . . . . 94
5.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 101
6 SUMMARY AND CONCLUSIONS . . . . . . . . . . . . . . . . . 105
6.1 Distributions-Oriented Methodsfor Small Verification Dataset . . . . . . . . . . . . . . . . . . 105
6.2 Assessment of Bias Correction Methodsfor Ensemble Forecasts . . . . . . . . . . . . . . . . . . . . . 106
6.3 Future Study and Remarks . . . . . . . . . . . . . . . . . . . 107
APPENDIX
A STATISTICAL METHODS . . . . . . . . . . . . . . . . . . . . . . 110
A.1 Logistic Regression Method . . . . . . . . . . . . . . . . . . . 110A.2 Kernel Density Estimation Method . . . . . . . . . . . . . . . 111A.3 Combination Method . . . . . . . . . . . . . . . . . . . . . . 116A.4 Contingency Table Approach . . . . . . . . . . . . . . . . . . 116
B SELECTED FIGURES AND TABLES . . . . . . . . . . . . . . . 117
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
v
LIST OF TABLES
Table Page
2.1 Example of verification dataset for June-September volume forecasts. 8
4.1 Parameters of beta distributions for the analytical model and trueforecast quality measures. . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Root Mean Squared Error (RMSE) in MSE/σ2x, ME/σx, TY2/σ2
x, andDIS/σ2
x for the forecasts generated for nonexceedance probability p =0.25 by the analytical model. . . . . . . . . . . . . . . . . . . . . . 40
4.3 Root Mean Squared Error (RMSE) in MSE/σ2x, ME/σx, TY2/σ2
x, andDIS/σ2
x for the forecasts generated for nonexceedance probability p =0.05 by the analytical model. . . . . . . . . . . . . . . . . . . . . . 40
4.4 Root Mean Squared Error (RMSE) in REL/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.25 by the analytical model. 46
4.5 Root Mean Squared Error (RMSE) in RES/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.25 by the analytical model. 46
4.6 Root Mean Squared Error (RMSE) in REL/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.05 by the analytical model. 49
4.7 Root Mean Squared Error (RMSE) in RES/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.05 by the analytical model. 49
4.8 Parameters used in fitting the distribution to observed monthly volume(U) and the first three L-moments of the ensemble volumes (X`1 , X`2 ,and X`3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.9 Summary statistics of the standardized random variables. . . . . . 53
4.10 Root Mean Squared Error (RMSE) in REL/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.25 by the stochastic model. 61
4.11 Root Mean Squared Error (RMSE) in RES/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.25 by the stochastic model. 61
4.12 Root Mean Squared Error (RMSE) in REL/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.05 by the stochastic model. 62
4.13 Root Mean Squared Error (RMSE) in RES/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.05 by the stochastic model. 62
vi
4.14 Basic information and true forecast quality measures of Subjective12-24-h Projection Probability-of-Precipitation Forecasts for UnitedStates during October 1980-March 1981 from Wilks (1995). . . . . 64
4.15 Root Mean Squared Error (RMSE) in REL/σ2x for the forecasts gen-
erated by the discrete model. . . . . . . . . . . . . . . . . . . . . . 66
4.16 Root Mean Squared Error (RMSE) in RES/σ2x for the forecasts gen-
erated by the discrete model. . . . . . . . . . . . . . . . . . . . . . 66
5.1 Mean, Standard Deviation (SD), and Coefficient of Variation (CV)of the observed monthly volume (cfsd) for the Des Moines River atStratford. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Mean Error (ME), Root Mean Square Error (RMSE), correlation co-efficient (CC), and Mean Square Error (MSE) Skill Score (SSMSE)between the observed monthly volume and historical simulations. . 73
B.1 BIAS in REL/σ2x for the forecasts generated for nonexceedance prob-
ability p = 0.25 by the analytical model of joint distribution. . . . 119
B.2 Standard Deviation in REL/σ2x for the forecasts generated for nonex-
ceedance probability p = 0.25 by the analytical model of joint distri-bution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
B.3 BIAS in RES/σ2x for the forecasts generated for nonexceedance prob-
ability p = 0.25 by the analytical model of joint distribution. . . . . 120
B.4 Standard Deviation in RES/σ2x for the forecasts generated for nonex-
ceedance probability p = 0.25 by the analytical model of joint distri-bution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
B.5 BIAS in REL/σ2x for the forecasts generated for nonexceedance prob-
ability p = 0.05 by the analytical model of joint distribution. . . . . 121
B.6 Standard Deviation in REL/σ2x for the forecasts generated for nonex-
ceedance probability p = 0.05 by the analytical model of joint distri-bution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
B.7 BIAS in RES/σ2x for the forecasts generated for nonexceedance prob-
ability p = 0.05 by the analytical model of joint distribution. . . . . 122
B.8 Standard Deviation in RES/σ2x for the forecasts generated for nonex-
ceedance probability p = 0.05 by the analytical model of joint distri-bution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
vii
LIST OF FIGURES
Figure Page
2.1 Map of Des Moines River Basin. . . . . . . . . . . . . . . . . . . . 5
2.2 Ensemble traces simulated for forecast on 1 June 1965. . . . . . . . 6
2.3 Probability distribution forecast for June-September volume. Theensemble traces are simulated with the current conditions as of 1June 1965. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Schematic of the current Extended Streamflow Prediction System. 10
3.1 Mean Error (ME) and Mean Square Error (MSE) for June-Septemberseasonal volume forecasts. . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 CR (on left) and LBR (on right) decompositions of MSE for June-September seasonal volume forecasts. . . . . . . . . . . . . . . . . 25
3.3 Various decompositions of MSE Skill Score for June-September sea-sonal volume forecasts. The upper left indicates CR decompositions,Relative Resolution (RRES) and Relative Reliability (RREL). Theupper right indicates LBR decompositions, Relative Discrimination(RDIS), Relative Sharpness (RS), and Relative Type 2 ConditionalBias (RTY2). The lower left shows Potential Skill, Reliability Mea-sure, and Unconditional Bias Measure. . . . . . . . . . . . . . . . 26
3.4 Reliability diagram for June-September seasonal volume forecasts is-sued for 0.25 quantile. . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Discrimination diagram for June-September seasonal volume forecastsissued for 0.25 quantile. . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 MSE/σ2x, ME/σx, TY2/σ2
x, and DIS/σ2x estimated by two approaches
for nonexceedance probability p = 0.25; “D” is discretized (11-binned)approach (DSC), “C” represents a continuous approach such as LRM,KDM, and CM. The maximum, upper quartile, median, lower quar-tile, and minimum are indicated from top to bottom. The forecastsare produced by the analytical model. . . . . . . . . . . . . . . . . 37
4.2 MSE/σ2x, ME/σx, TY2/σ2
x, and DIS/σ2x estimated by two approaches
for nonexceedance probability p = 0.05; “D” is discretized (11-binned)approach (DSC), “C” represents a continuous approach such as LRM,KDM, and CM. The maximum, upper quartile, median, lower quar-tile, and minimum are indicated from top to bottom. The forecastsare produced by the analytical model. . . . . . . . . . . . . . . . . 38
viii
4.3 Conditional mean of the observations given the forecasts µx|f andmarginal distribution of the forecasts s(f) estimated by three meth-ods, DSC, LRM, and KDM, for nonexceedance probability p = 0.25with a sample size 50. The forecasts are produced by the analyticalmodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Conditional distribution of the forecasts given the observations r(f |x)estimated by three methods, DSC, LRM, and KDM, for nonexceedan-ce probability p = 0.25 with a sample size 50. The forecasts areproduced by the analytical model. . . . . . . . . . . . . . . . . . . 42
4.5 Conditional mean of the observations given the forecasts µx|f andmarginal distribution of forecasts s(f) estimated by three methods,DSC, LRM, and KDM, for nonexceedance probability p = 0.05 witha sample size 50. The forecasts are produced by the analytical model. 43
4.6 Conditional distribution of the forecasts given the observations r(f |x)estimated by three methods, DSC, LRM, and KDM, for nonexceedan-ce probability p = 0.05 with a sample size 50. The forecasts areproduced by the analytical model. . . . . . . . . . . . . . . . . . . 44
4.7 CR decompositions estimated by four approaches for nonexceedanceprobability p = 0.25; “D” is discretized (11-binned) approach (DSC),“L” is logistic regression (LRM), “K” is kernel density estimationdirectly applied to r(f |x) (KDM), and “C” is combination of logisticregression and kernel density estimation (CM). The maximum, upperquartile, median, lower quartile, and minimum are indicated from topto bottom. The forecasts are produced by the analytical model. . . 45
4.8 CR decompositions estimated by 4 approaches for nonexceedanceprobability p = 0.05; “D” is discretized (11-binned) approach (DSC),“L” is logistic regression (LRM), “K” is kernel density estimation di-rectly applied to r(f |x) (KDM), and “C” is combination of logisticregression and kernel density estimation (CM). The maximum, upperquartile, median, lower quartile, and minimum are indicated from topto bottom. The forecasts are produced by the analytical model. . . 48
4.9 Relations between observations and L-moments for September month-ly volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.10 Scatterplot of transformed observed monthly volume and transformedL-moments of monthly volume ensembles. . . . . . . . . . . . . . . 53
4.11 Conditional mean of the observations given the forecasts µx|f esti-mated by three methods, DSC, LRM, and KDM, for nonexceedanceprobability p = 0.25 with sample sizes 50 and 1000. The forecasts areproduced by the stochastic model. . . . . . . . . . . . . . . . . . . 56
4.12 Conditional mean of the observations given the forecasts µx|f esti-mated by three methods, DSC, LRM, and KDM, for nonexceedanceprobability p = 0.05 with sample sizes 50 and 1000. The forecasts areproduced by the stochastic model. . . . . . . . . . . . . . . . . . . 57
ix
4.13 CR decompositions estimated by four approaches for nonexceedanceprobability p = 0.25; “D” is discretized (11-binned) approach (DSC),“L” is logistic regression (LRM), “K” is kernel density estimationdirectly applied to r(f |x) (KDM), and “C” is combination of logisticregression and kernel density estimation (CM). The maximum, upperquartile, median, lower quartile, and minimum are indicated from topto bottom. The forecasts are produced by the stochastic model. . . 58
4.14 CR decompositions estimated by four approaches for nonexceedanceprobability p = 0.05; “D” is discretized (11-binned) approach (DSC),“L” is logistic regression (LRM), “K” is kernel density estimationdirectly applied to r(f |x) (KDM), and “C” is combination of logisticregression and kernel density estimation (CM). The maximum, upperquartile, median, lower quartile, and minimum are indicated from topto bottom. The forecasts are produced by the stochastic model. . . 59
4.15 True marginal and conditional distributions of the discrete forecasts. 63
4.16 Conditional mean of the observations given the forecasts µx|f esti-mated by three methods, DSC, LRM, and KDM, with sample sizes50 and 1000. The forecasts are produced by the discrete model. . . 65
4.17 CR decompositions estimated by four approaches for Discrete Fore-cast; “D” is discretized (12-binned) approach (DSC), “L” is logisticregression (LRM), “K” is kernel density estimation directly appliedto r(f |x) (KDM), and “C” is combination of logistic regression andkernel density estimation (CM). The maximum, upper quartile, me-dian, lower quartile, and minimum are indicated from top to bottom.The forecasts are produced by the discrete model. . . . . . . . . . 67
5.1 Example of Bias Correction Method applied to ensemble traces. . 71
5.2 Comparison of observed monthly volume and historical simulationfrom January 1988 to December 1997. . . . . . . . . . . . . . . . . 74
5.3 Example of the bias-correction for 1-month lead time forecast withinitial condition of January in 1949; EBC (Event-Bias CorrectionMethod) is left and RLI (Linear Interpolation) right. . . . . . . . . 76
5.4 Observed monthly volume versus simulated monthly volume withpower function for May and September. . . . . . . . . . . . . . . . 77
5.5 Observed monthly volume versus simulated monthly volume withLOWESS regression for May and September. . . . . . . . . . . . . 78
5.6 Example of the Quantile Mapping method (QM) for 1-month leadtime forecast with initial condition of January in 1949. . . . . . . . 79
5.7 MSE Skill Score (left) and Skill Score for Bias Correction (right) ver-sus forecasted month for 1, 2, and 3-month lead times, averaged overthe quantiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
x
5.8 Skill Score for Bias Correction for May and September monthly vol-umes, averaged over the quantiles. . . . . . . . . . . . . . . . . . . 82
5.9 Comparison of Mean Error (left) and Mean Square Error (right) byfive Bias Correction methods, actual (non bias-corrected) streamflowsimulation (NBC), and pseudoperfect streamflow simulation (PSS),for 1, 2, and 3-month lead time September monthly volume forecasts. 83
5.10 Comparison of MSE Skill Score (left) and measure of association(right) by five Bias Correction methods, actual (non bias-corrected)streamflow simulation (NBC), and pseudoperfect streamflow simula-tion (PSS), for 1, 2, and 3-month lead time September monthly volumeforecasts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.11 Comparison of Decompositions of Skill Score by five Bias Correctionmethods, actual (non bias-corrected) streamflow simulation (NBC),and pseudoperfect streamflow simulation (PSS), for 1, 2, and 3-monthlead time September monthly volume forecasts. The measure of reli-ability is left, and the measure of unconditional bias is right. . . . 86
5.12 Performance measures and decompositions of MSE Skill Score by fiveBias Correction methods, actual (non bias-corrected) streamflow sim-ulation (NBC), and pseudoperfect streamflow simulation (PSS), for1, 2, and 3-month lead time May monthly volume forecasts. . . . . 87
5.13 Marginal distribution of the forecasts s(f) and the conditional meanof the forecasts µx|f by five Bias Correction methods, actual (nonbias-corrected) streamflow simulation (NBC), and pseudoperfect stre-amflow simulation (PSS), for 1-month lead time September monthlyvolume forecasts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.14 CR decompositions by five Bias Correction methods, actual (nonbias-corrected) streamflow simulation (NBC), and pseudoperfect stre-amflow simulation (PSS), for 1, 2, and 3-month lead time Septembermonthly volume forecasts. . . . . . . . . . . . . . . . . . . . . . . . 90
5.15 Conditional distributions of the forecasts r(f |x = 0) (left) and r(f |x =1) (right) by five Bias Correction methods, actual (non bias-corrected)streamflow simulation (NBC), and pseudoperfect streamflow simula-tion (PSS), for 1-month lead time September monthly volume fore-casts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.16 Conditional mean of the forecasts given the observations µf |x for fiveBias Correction methods, actual (non bias-corrected) streamflow sim-ulation (NBC), and pseudoperfect streamflow simulation (PSS), for1-month lead time September monthly volume forecasts. . . . . . 93
xi
5.17 Conditional mean of the forecasts given the observations µf |x for EBCand QM bias correction methods with NBC. The forecasts were issuedfor September monthly volume with 1-month lead time. The threecurves for each colour in the bottom two figures show µf |x=1, µf , andµf |x=0 from top to bottom. . . . . . . . . . . . . . . . . . . . . . . 93
5.18 Relative sharpness by five Bias Correction methods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfect stream-flow simulation (PSS), for 1, 2, and 3-month lead time Septembermonthly volume forecasts. . . . . . . . . . . . . . . . . . . . . . . . 94
5.19 LBR decompositions by five Bias Correction methods, actual (nonbias-corrected) streamflow simulation (NBC), and pseudoperfect stre-amflow simulation (PSS), for 1, 2, 3-month lead time September mont-hly volume forecasts. . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.20 Mean Error and unconditional bias from decomposition of Skill Scoreby five Bias Correction methods, actual (non bias-corrected) stream-flow simulation (NBC), and pseudoperfect streamflow simulation (PS-S), for all the months with 1, 3, and 6-month lead times. . . . . . . 97
5.21 CR decompositions by five Bias Correction methods, actual (nonbias-corrected) streamflow simulation (NBC), and pseudoperfect stre-amflow simulation (PSS), for all the months with 1, 3, and 6-monthlead times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.22 LBR decompositions by five Bias Correction methods, actual (nonbias-corrected) streamflow simulation (NBC), and pseudoperfect stre-amflow simulation (PSS), for all the months with 1, 3, and 6-monthlead times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.23 Relative sharpness by five Bias Correction methods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfect stream-flow simulation (PSS), for all the months with 1, 3, and 6-month leadtimes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.24 MSE Skill Score and potential skill by five Bias Correction methods,actual (non bias-corrected) streamflow simulation (NBC), and pseu-doperfect streamflow simulation (PSS), for all the months with 1, 3,and 6-month lead times. . . . . . . . . . . . . . . . . . . . . . . . . 102
A.1 Example of logistic regression applied to the pairs of forecasts andobservations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
A.2 Unbounded estimation with biweight kernel. . . . . . . . . . . . . 112
A.3 Bounded estimation with floating boundary kernel. . . . . . . . . . 114
A.4 Bounded estimation with biweight kernel and reflection boundarytechnique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
xii
A.5 Example of kernel density estimation method applied to forecasts toestimate the marginal distribution s(f). . . . . . . . . . . . . . . . 116
B.1 CR decompositions by five Bias Correction methods, actual (nonbias-corrected) streamflow simulation (NBC), and pseudoperfect stre-amflow simulation (PSS), for 1-month lead time May monthly volumeforecasts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
B.2 LBR decompositions by five Bias Correction methods, actual (nonbias-corrected) streamflow simulation (NBC), and pseudoperfect stre-amflow simulation (PSS), for 1-month lead time May monthly volumeforecasts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
xiii
1
CHAPTER 1
INTRODUCTION
After the devastating floods in 1993 in the Midwest, the National Weather
Service (NWS) proposed development of Advanced Hydrologic Prediction Services
(AHPS) for streamflow forecasting. The first demonstration of AHPS system was
carried out for the Des Moines River basin. AHPS produces short-range forecasts
of the flood levels and the timing of flood crests. AHPS also produces long-range
probabilistic streamflow forecasts. The forecast includes the chance (or probabil-
ity) of exceeding minor, moderate, or major flooding, and the chance of exceeding
certain water levels, volumes, and flows on the river over the next 90 days. These
probabilistic forecasts are issued as probability distributions for streamflow, where
streamflow is treated as a continuous random variable. Hence, they are called prob-
ability distribution forecasts, as opposed to more traditional probabilistic forecasts
for discrete events. The probability distribution forecast AHPS produces has the
advantage that users can obtain probabilistic forecasts for the events they are inter-
ested in. On the other hand, intuitively it is more difficult to evaluate probability
distribution forecasts than categorized forecasts.
This research defines forecast verification as the procedure to assess the de-
gree of agreement between forecasts and observations, following Murphy and Daan
(1985). Forecast verification has traditionally been implemented using one or more
verification measures (Murphy, 1993). This approach fails to give a complete picture
of the forecast quality for many kinds of forecasts, not to mention for the probability
distribution forecasts. In the late 1980s, Murphy and Winkler (1987) proposed a
general framework of forecast verification called the Distributions-Oriented (DO)
approach. Based on the joint distribution of forecasts and observations, this ap-
proach unifies and imposes a structure on the verification methodology, provides
insight into the relationships among verification measures, and creates a sound sci-
entific basis to develop and/or choose particular verification measures in specific
contexts (Murphy and Winkler, 1987).
The original DO approach assumes the forecasts and observations are ex-
pressed as discrete variables. Thus, the DO approach is not directly applicable to
2
probability distribution forecasts of continuous variables. The objectives of this re-
search are (1) to extend the DO approach to the verification problem of probability
distribution forecasts (or ensemble forecasts) of streamflow, and (2) to demonstrate
its usefulness in assessing the quality of streamflow forecasts.
In the application of the DO approach to streamflow forecasts, the major prob-
lem stems from the small sample size. For instance, in the case of meteorological
forecasts, say, maximum daily temperature, 365 pairs of forecasts and observations
would be available per year. After 50 years, 18250 pairs could be utilized for verifi-
cation. But if the forecast of interest is summer season flow volume, after 50 years,
just 50 pairs would be available for verification. The DO approach outlined by
Murphy (1997) requires the construction of the joint distribution of forecasts and
observations, where forecasts and observations are discrete random variables. With
such a small sample, categorizing continuous probabilistic forecasts into discrete bins
may not be appropriate to estimate the joint distribution, and the verification may
lead to a distorted impression of forecasting system. In this research, an alternative
approach which does not categorize the probabilistic forecasts is investigated.
In order to demonstrate that the DO approach provides useful information
to assess forecast quality, this research addresses the assessment of bias correction
methods applied to ensemble forecasts. The forecasting system in this research
is based on Extended Streamflow Prediction (ESP), which produces probabilistic
forecasts by doing statistical analysis of ensemble traces. The ensemble traces are
simulated by a hydrological model. However, in most cases the hydrological model
may have some bias associated with its assumptions or input data. In practice, bias
correction methods are utilized to correct the bias in simulations. However, it is
not clear how the bias in ensemble traces propagates to the probabilistic forecasts,
and how these methods improve the forecasts. The probabilistic forecasts modified
with the bias correction methods are investigated by using the DO approach.
3
CHAPTER 2
FORECASTING SYSTEM
An experimental forecasting system for the Des Moines River basin (Bradley
and Schwartz, 2000) is used to develop and test approaches for verification of prob-
abilistic streamflow forecasts. Like the Advanced Hydrologic Prediction Services
(AHPS) forecasts from the National Weather Service (NWS), the experimental fore-
casts are made using an ensemble forecasting technique. This chapter explains the
study area and input datasets first, and then discusses how forecasts are made. An
overview of approach used for verification is given along with the development of a
verification dataset from the forecasts.
2.1 Study Area and Data Resources
The study area is the Upper Des Moines River basin stretching from the
southern part of Minnesota to central Iowa (Figure 2.1). This research uses the
discharge data obtained at Stratford, Iowa. The Des Moines River basin contains
two major reservoirs, and the Upper Des Moines River is a main source of inflow
into Saylorville Reservoir. This is why Stratford was chosen as the station for this
research, since long-term forecasts of inflow into reservoir are important in reservoir
operations.
The drainage area of the Upper Des Moines River basin is about 14,120 km2,
and the elevation ranges from 290 to 518 m above mean sea level. The gently rolling
terrain, formed by continental glaciation and subsequent erosion, supports extensive
cultivated corn fields. The Upper Des Moines River has two main tributaries: the
West Fork and the East Fork Des Moines River. The West Fork River has its origins
in the glacial moraine area of Pipestone, Lyon, and Murray Counties, Minnesota at
the elevation of about 580 m. The southeastward flow meets the East Fork, which
flows southeasterly from Jackson County, Minnesota. The subbasins of the West and
East Forks have many lakes, especially in Minnesota. The Upper Des Moines River
passes through Fort Dodge, Iowa and joins the Boone River before Stratford. Ac-
cording to USGS NWISWeb Data for the Nation (http://waterdata.usgs.gov/nwis),
the daily mean streamflow, obtained from 82 years of records, varies from 500 to
4
6,000 cfs. The maximum peak streamflow of 42,300 cfs was recorded on 2 April 1993.
For more on the hydrological characteristics, see Bae and Georgakakos (1992).
The Hydrological Simulation Program-Fortran (HSPF) (Donigian et al., 1984,
and Bicknell et al., 1997) was applied to the Upper Des Moines River basin, and
the basin was modeled as a single lumped catchment. HSPF is a lumped hydrologic
model that can simulate both watershed hydrology and water quality continuously.
The time series of simulated streamflow is obtained by inputting a set of mean
areal meteorological time series for the land segment. The input time series data
consists of daily data of precipitation and potential evapotranspiration, and hourly
data of air temperature, dew point temperature, wind movement, cloud cover, and
solar radiation. For calibration, daily streamflow records were obtained at Strat-
ford, Iowa, from U.S. Geological Survey (USGS). Precipitation and air temperature
data obtained from the National Climatic Data Center were interpolated over the
basin. The dew point temperature, wind movement, and cloud cover were obtained
for three surface airways stations from National Center for Atmospheric Research
(NCAR). The solar radiation and potential evapotranspiration time series were es-
timated based on the air temperature, dew point temperature, wind movement, and
cloud cover data (see Shuttleworth, 1993).
HSPF model was calibrated with two objective functions at Stratford; the
first one is the root mean squared error of the simulated and observed flows, and
the second one is the root mean squared error of the logarithms of these flows.
The two objective functions were evaluated, using weekly time step flows. To auto-
mate the calibration of HSPF model parameters, Shuffled Complex Evolution global
optimization method (SCE-UA) was applied.
2.2 Probabilistic Forecasting System
The experimental forecasting system implemented in this research is based
on Extended Streamflow Prediction (ESP) (Day 1985). ESP produces probabilistic
forecasts by statistical analysis of different realizations in the future. This is the
same concept as NWS uses in AHPS.
To explain the basic idea of ESP, an example of streamflow forecasts is shown.
Assume the present time is June 1st 1965, and a forecast will be made of June-
September flow volume. ESP assumes that historical meteorological time series
represent possible realizations in the future. One streamflow trace is simulated by
inputting each historical meteorological time series into HSPF, using the current
5
Figure 2.1: Map of Des Moines River Basin.
watershed conditions as the initial conditions. Since 48 years of historical record
are available (from 1948 to 1996, excluding the current year), 48 streamflow traces
are obtained (Figure 2.2). As June-September volume is of interest in this example,
flow volumes are computed from the streamflow traces. Then, the cumulative dis-
tribution function of the ensemble traces is estimated by weighting each trace, using
the method proposed by Smith et al. (1992). Finally, the probability distribution
forecasts are produced for June-September volume in terms of nonexceedance prob-
ability (Figure 2.3); for any value of the volume (threshold), the likelihood of the
event whose volume is less than or equal to the threshold is obtained.
2.3 Proposed Verification Approach
2.3.1 Forecasts for a Discrete Event
From the framework of ESP explained above, probability distribution forecasts
are obtained in terms of nonexceedance probability. The mathematical definition
6
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
0 10 20 30 40 50 60 70 80 90
Stre
amflo
w (c
fs-d
ays)
Days (after June 1)
Des Moines at Stratford
Figure 2.2: Ensemble traces simulated for forecast on 1 June 1965.
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000
2200000
99989489806960504030201051
Vol
ume
(cfs
-day
s)
Nonexceedance Probability (%)
Des Moines River near Stratford, Iowa
Figure 2.3: Probability distribution forecast for June-September volume. The en-semble traces are simulated with the current conditions as of 1 June 1965.
7
of the forecasts is given as:
Gt(y) = P{Y ≤ y|αt}, (2.1)
where P{Y ≤ y|αt} is the probability that the forecast variable Y , for example
monthly streamflow volume, is less than or equal to some threshold y, conditioned
on the state of the hydroclimatic system αt at a certain forecast date t. Obviously,
it is not straightforward to verify the forecasts in the form of Gt(y). The follow-
ing discusses the approach taken for the verification of the probability distribution
forecasts.
First, consider a discrete event Y ≤ yp where yp has the climatological nonex-
ceedance probability p. The probabilistic forecast for the event Y ≤ yp is simply
given as
f(yp) = Gt(yp). (2.2)
Then, the corresponding observation can be discretized as:
x(yp) = 1, Y ≤ yp
= 0, Y > yp. (2.3)
Therefore, pairs of probabilistic forecasts f(yp) and discrete observations x(yp)
are obtained for the discrete event Y ≤ yp. The pairs that are used to esti-
mate the joint distribution of f(yp) and x(yp) are called a verification dataset.
From one verification dataset, one set of measures of forecast quality for a thresh-
old yp is computed. To evaluate the quality of probability distribution forecast
over the range of possible outcomes, this research uses nine thresholds yp with
p = 0.05, 0.10, 0.25, 0.33, 0.50, 0.66, 0.75, 0.90, 0.95. Hence, nine verification datasets
are obtained.
2.3.2 Verification Dataset
One verification dataset for threshold yp is made up of the N pairs of forecasts
f(yp) and observations x(yp). It is important to note that this verification dataset
contains a portion of information needed to obtain a complete picture of the forecast
quality. Table 2.1 shows the example of verification dataset for June-September vol-
ume forecasts. The forecasts are issued on June 1st every year for the event Y ≤ yp
with the threshold yp = 376212 (p = 0.66). According to the probability distribu-
tion forecast shown in Figure 2.3, the probabilistic forecast on June 1st in 1965 for
the event is f(yp) = 0.350422. The volume observed from June to September in
is 377897, which is slightly greater than yp. Therefore, the corresponding discrete
observation x(yp) is equal to 0, which indicates that the event did not occur.
8
Table 2.1: Example of verification dataset for June-September volume forecasts.
Pairs Obs.c(cfs-d)
Date of Forecasta f(yp) x(yp)b Y
1949/06/01 0.846400 1 72315
1950/06/01 0.723477 1 198444
1951/06/01 0.518131 0 677080
1952/06/01 0.761163 1 259610
1953/06/01 0.766671 1 303072
: : : :
1964/06/01 0.847121 1 229902
1965/06/01 0.365253 0 377897
1966/06/01 0.762934 1 132601
1967/06/01 0.930124 1 301787
: : : :
a The forecasts were issued on June 1st every year.b The threshold for forecasts, yp = 376212 is 0.66
quantile.c Obs. is observed June-September volume.
2.4 Discussion
More research on forecasts using ensembles has been done in the meteorologi-
cal field. In the meteorological field, ensemble forecasts are often called an Ensemble
Prediction System (EPS), whereas they are called Extended Streamflow Prediction
(ESP) in the hydrological field. The main difference between current EPS and ESP
is that the ensemble traces in meteorological and hydrological fields are produced
in the different ways. Figure 2.4 shows the current version of ESP used in NWS.
NWS has extended the original idea to facilitate incorporation of climate outlooks
into the ESP (Perica, 1998). The NWS ESP program produces ensemble traces
9
by inputting historical meteorological events adjusted with meteorological and cli-
matological forecasts, and deterministic precipitation forecasts. Another way to
incorporate climate outlooks and meteorology probabilistic forecasts is to adjust
weights for ensemble traces simulated with historical meteorological events (Croley,
2000). More investigations on incorporation of climate and meteorology forecasts
into ESP are needed.
On the other hand, in meteorological research, the ensembles of geopotential
heights, temperatures, or moisture are created from slightly different initial condi-
tions. The main methods to generate the initial conditions of the ensemble members
are (1) Monte Carlo methods, (2) methods which generate perturbations dynam-
ically constrained by the flow of the day, including breeding and singular vectors,
(3) the perturbed observations method which uses data assimilation cycles with
random errors, and (4) methods which make perturbations by varying the model
parameterizations of subgrid-scale physical processes (Hou et al. 1998). These en-
sembles of meteorological variables could be directly input into a hydrological model
to produce an ensemble of streamflow.
As mentioned in Chapter 1, AHPS provides probabilistic forecasts which in-
dicate the exceedance probability of certain levels over the next 90 days. In the
meteorological field, ensemble traces have been utilized mainly in the following four
ways (Anderson 1996): (1) use the ensemble mean forecast as a substitute for a sin-
gle discrete forecast; (2) produce a small, easily understood set of forecast states by
clustering algorithms; (3) make a priori predictions of forecast skill, that is, figure
out the relation between ensemble spread and skill of the control forecast; and (4)
examine the entire ensemble to extract as much information as possible. For ex-
ample, the quantitative precipitation forecasts are given by exceedance probability
for some continuous thresholds. In fact, this is the same method as AHPS utilizes.
Nowadays, more efforts have been put into item (4).
As we see, many aspects of ensemble forecasting are common in meteorological
and hydrological fields. The cooperation between the researchers in these fields is
necessary to improve ensemble forecasts of streamflow.
2.5 Summary and Conclusions
This research utilizes an experimental forecasting system that has been devel-
oped for the Upper Des Moines River basin. The discharge at Stratford, Iowa, was
10
His
torica
l se
rie
s o
fp
recip
ita
tio
n a
nd
tem
pe
ratu
re
Me
teo
rolo
gic
al
fore
ca
sts
/clim
ate
ou
tlo
oks:
- 1
- to
5-d
ay
- 6
- to
10
-da
y-
mo
nth
ly o
utlo
ok
- 1
3 3
-mo
nth
ou
tlo
oks
NW
Sa
dju
stm
en
tp
roce
du
re
“Ad
juste
d”
se
rie
s o
fp
recip
ita
tio
n a
nd
tem
pe
ratu
re
Cu
rre
nt
co
nd
itio
ns:
- sn
ow
pa
ck
- so
il m
ois
ture
- str
ea
mflo
w-
rese
rvo
ir le
ve
ls
NW
Sh
yd
rolo
gic
Mo
de
l
Pre
cip
ita
tio
nd
ete
rmin
istic F
ore
ca
st
24
(48
) h
ou
rs
Str
ea
mflo
wtr
ace
s
19
48
19
95
P PT T
tim
e
19
48
19
95
P PT T
tim
e
“19
48
”
“19
95
”
tim
etim
e
EX
TE
ND
ED
(E
NS
EM
BLE
) S
TR
EA
MF
LO
W P
RE
DIC
TIO
N P
RO
CE
DU
RE
Str
ea
mflo
wtr
ace
s
Q
- 5
0 -
75
%
- 2
5 -
50
%
- 0
- 2
5 %
Fig
ure
2.4:
Sch
emat
icof
the
curr
ent
Exte
nded
Str
eam
flow
Pre
dic
tion
Syst
em.
Sour
ce:
Per
ica,
Sanj
a,In
tegr
atio
nof
Met
eoro
logi
cal
Fore
cast
s/C
limat
eO
utlo
oks
Into
Ext
ende
dSt
ream
flow
Pre
dict
ion
(ESP
)Sy
stem
,ht
tp:/
/ww
w.n
ws.
noaa
.gov
/oh/
hrl
/pap
ers/
ams/
ams9
8-6.
htm
,(a
cces
sed
Mar
ch10
,19
98).
11
chosen since long-term forecasts of inflow into the downstream reservoir are impor-
tant for operations. The Upper Des Moines River basin drains about 14,120 km2,
and the gently rolling terrain was formed by continental glaciation and subsequent
erosion. In 1993, the record-breaking peak streamflow of 42,300 cfs was observed.
The Hydrological Simulation Program-Fortran (HSPF), which is a lumped
hydrologic model, was applied to the Upper Des Moines River basin. Sets of mean
areal time series data, such as daily precipitation, potential evapotranspiration,
hourly data of air temperature, and so on, were produced from various sources.
HSPF model was calibrated with two objective functions at Stratford, and the
optimum parameters were automatically obtained by Shuffled Complex Evolution
global optimization method (SCE-UA).
The experimental forecasting system is based on the idea of Extended Stream-
flow Prediction (ESP). The time series of historical meteorological information ob-
tained were inputted into the HSPF model, and then streamflow was simulated
with the current hydroclimatological conditions on a forecast date. The outputs of
streamflow are assumed to be different realizations in the future, which are called
ensemble traces. Finally, the probabilistic distribution forecast, for the forecast date
expressed in nonexceedance probability, was produced by statistical analysis of the
ensemble traces.
The problem is to verify the forecast for continuous range of streamflows, since
the probability distribution forecast gives a probabilistic forecast for any possible
outcome. One solution is to consider a discrete event that a forecast variable is less
than or equal to a threshold. For the event, one probabilistic forecast is derived
from the probability distribution forecast in terms of nonexceedance probability.
On the other hand, the corresponding continuous observation is converted into 0
or 1; 0 indicates the event did not occur, and 1 means the event occurred. This
pair of probabilistic forecast and discrete observation is obtained every forecast
date. Thus, one verification dataset for a threshold is made up of as many pairs
of forecasts and observations as forecast date. Since nine quantiles of observations
are used as the thresholds covering the possible outcomes, nine verification datasets
are computed. Investigation of these nine verification datasets can be considered
equivalent to examination of forecast quality of the probability distribution forecast.
12
CHAPTER 3
VERIFICATION APPROACH
The proposed approach for verification of ensemble streamflow predictions
involves selecting discrete events. The probabilistic forecast for an event – a forecast
variable is less than or equal to a threshold – is obtained from the probability
distribution forecast. The corresponding continuous observation is also converted
into a discrete number: 1 indicates that the event occurred, 0 means that the
event did not occur. The verification dataset for the event consists of the pairs
of probabilistic forecasts and discrete observations. Using the verification datasets
derived for discrete events, forecast quality of the probability distribution forecast
can be assessed over the range of possible outcomes.
In this chapter, a distributions-oriented (DO) approach for forecast verifica-
tion is described. The DO approach is extended to case of continuous probabilistic
forecasts with parametric and nonparametric techniques to estimate the joint distri-
bution of forecasts and observations. Secondly, the technical methods are described
in detail. Then, DO measures and other common measures for forecast verification
are discussed. The technical methods in the extended DO approach will be assessed
in the next chapter.
3.1 Introduction
Verification procedures can be classified into two categories (Murphy, 1997):
a measures-oriented (MO) approach and a distributions-oriented (DO) approach.
The MO approach is traditionally used in the verification process. Literally, this
approach emphasizes calculating quantitative measures of only one or two aspects
of forecasting quality such as bias, accuracy, or skill, and then makes conclusions
based on these measures. In most cases, the mean squared error (hereafter re-
ferred to as MSE), and the correlation coefficient (CC) are used as the accuracy
measure. However, CC is shown to be a measure of potential skill by Murphy et
al. (1989). Although many verification measures had been developed, until the
1980’s the investigation of the relationships between measures, examination of their
relative strengths and weaknesses, or general concepts about verification itself had
13
not been studied extensively (Murphy and Winkler, 1987). For example, Barnston
(1992) showed the nonlinear, one-to-one relationship between CC and RMSE for
standardized forecasts and observations, and the significant variation of the mean
correspondence between CC and Heidke score with the number of equally likely
Heidke categories. Murphy (1995) concluded that the coefficient of determination
is superior to the CC as the measure of linear association, and both of them are not
proper as the measure of skill.
The DO approach was developed in the 1980’s. Since then, the DO approach
has played an important role, especially in the verification of meteorological fore-
casts. For instance, the diagnostic verification of Climate Prediction Center Long-
Lead Outlooks has been done with this approach (Wilks 2000). The forecasts made
by human forecasters and guidance products from numerical weather prediction
models were investigated (Brooks and Doswell, 1996). Also, the verification of the
forecasts produced based on the Ensemble Prediction System (EPS) has been done
(e.g., Hamill and colucci 1997, and Hou et al. 1998). The DO approach involves
the use of the joint distribution of forecasts and observations, from which all the
measures of forecast quality are derived systematically. The reason why the DO ap-
proach is preferable is that it gives insights on forecast quality from various aspects
and allows the user to identify the situations in which forecast performance may be
weak or strong, something the MO approach fails to do (Brooks and Doswell III,
1996).
The major difficulty in applying this approach to verification stems from the
estimation of the joint distribution. Two fundamental characteristics of the verifi-
cation problem are complexity and dimensionality, which are quantitatively defined
by Murphy (1991). Complexity is defined by the number of factorizations (CF ),
number of basic factors in each factorization (CBF ), or total number of basic factors
(CTBF ). For example, in Absolute Verification (AV), where one kind of observation
and one forecasting system are examined, the joint distribution can be factorized
into one conditional and one marginal distribution. Thus, CF = 2, CBF = 2, and
CTBF = 4. On the other hand, the general definition of dimensionality D is that
D is the number of degrees of freedom in order to estimate the joint distribution of
forecasts and observations. In the case where a forecasting system uses nx categories
for observations and nf for forecasts, the dimensionality D is defined as
D = nf × nx − 1. (3.1)
14
For example, when a forecast is issued in 11 categories from 0 to 1 with 0.1 inter-
val for a dichotomous (two-category) observation, the verification problem has the
dimension D = 11× 2− 1 = 21. Then, given 50 pairs of forecasts and observations,
which is not unusual with hydrological variables, it is no wonder that some bins may
not have enough (or any) subsamples to estimate the joint distribution. Hence, most
verification problems suffer from the “curse of dimensionality” (Murphy, 1997). In
Chapter 4, techniques to reduce the dimensionality (but not the complexity) are
investigated. This chapter describes the measures of the DO approach tailored to
probabilistic forecasts and dichotomous observations, and their estimators, in such
a way that the dimensionality of the joint distribution is reduced. As a result, all
of the DO measures are derived from six basic variables and one integral.
3.2 Distributions-Oriented Measures
One can derive the distributions-oriented (DO) measures from the joint dis-
tribution of forecasts and observations p(f, x) ,where f is the probabilistic forecast
issued for an event that forecast variable Y is equal to or less than a threshold yp
(f = f(yp)), and x is the corresponding discrete observation (x = x(yp)), which
takes on 1 for occurrence of the event and 0 for no occurrence. The measures de-
rived from the joint distribution can be examined over the range of thresholds for
which the forecasts are issued.
To cast light on understanding of the joint distribution from various aspects,
it can be factorized into one conditional and one marginal distribution in two ways
(Murphy and Winkler, 1987):
CR factorization: p(f, x) = q(x|f)s(f) (3.2)
LBR factorization: p(f, x) = r(f |x)t(x). (3.3)
The calibration refinement (CR) factorization is used more often, and easy to un-
derstand partly because a forecast is issued first, then the observation is compared
(Murphy and Winkler 1987, Brooks and Doswell III 1996). On the other hand, it
is easier to reconstruct the marginal and conditional distribution for the likelihood-
based rate (LBR) factorization, since the observation random variable x takes on 1
or 0 only. This research mainly utilizes the following measures described in Murphy
(1997).
15
3.2.1 Bias
The expected value µf for the marginal distribution of the forecasts s(f) and
the expected value µx for the marginal distribution of the observations t(x) are
utilized to characterize the unconditional bias defined as Mean Error (ME):
ME = µf − µx. (3.4)
3.2.2 Accuracy
A measure of the Accuracy of the forecasts is the mean square error (MSE),
which is defined using the joint distribution p(f, x) as:
MSE(f, x) =∑
f
∑x
p(f, x)(f − x)2. (3.5)
One decomposition of MSE can be written as (Murphy, 1988):
MSE(f, x) = (µf − µx)2 + (σf − σx)
2 + 2(1− ρf,x)σfσx. (3.6)
The last term, called dispersion error, may be considered the most important mea-
sure of the forecast error, because it cannot be calibrated out (Hou et al., 1998).
The Skill of the forecast is the accuracy relative to a reference forecast method-
ology. The skill score using climatology as a reference (i.e., the forecast f is the
unconditional mean, µx) is:
SSMSE(f, µx, x) = 1− [MSE(f, x)/σ2x]. (3.7)
where σ2x is the variance of the observations. A decomposition of SSMSE:
SSMSE(f, µx, x) = ρ2fx − [ρfx − (σf/σx)]
2 − [(µf − µx)/σx]2 (3.8)
consists of a measure of potential skill (the first term), a relative measure of reliabil-
ity, also known as type 1 conditional bias (the second term), and a relative measure
of unconditional bias (the third term). The third term is better than ME when the
unconditional bias is compared over the possible outcomes.
3.2.3 Calibration-Refinement Measures
Given a specific probability forecast f , certain aspects of the distribution
of observations x are desirable. The Calibration-refinement factorization, which is
conditional on the forecasts, can be used to explore these aspects of forecast quality.
Reliability (Type 1 conditional bias) describes the bias of the observations
given a forecast f . Forecasts that are conditionally unbiased are desirable. One
16
measure of this conditional bias is:
REL = Ef (µx|f − f)2 (3.9)
where Ef denotes the expected value with respect to the distribution of the forecasts
and µx|f is the expected value of the observations conditional on the forecasts.
Resolution indicates the degree to which the mean observation for a specific
forecast f differs from the unconditional mean (or climatological probability). Fore-
casts with large differences (higher resolution) are more desirable. One measure of
the resolution is:
RES = Ef (µx|f − µx)2 (3.10)
The connection between the reliability and resolution of the forecasts, and the
MSE (or skill) of the forecasts, can be seen through a decomposition of the MSE
into its components. Conditioning on the forecast leads to the so-called calibration-
refinement (CR) decomposition:
MSECR(f, x) = σ2x + REL− RES. (3.11)
where σ2x, the variance of the observation, measures the inherent uncertainty. If the
p-quantile of observation is used as the threshold for which forecasts are issued, the
uncertainty is analytically obtained as:
σ2x = p(1− p). (3.12)
In case of perfect forecasts, since MSE = 0 and REL = 0, RES = σ2x. Substituting
the CR decomposition into SS (Equation (3.7)) gives
SS =RES
σ2x
− REL
σ2x
. (3.13)
This research evaluates the measures of forecast quality over the range of possible
outcomes. It is more insightful to use the measures of Resolution and Reliability
relative to the Uncertainty of events. Thus, RES/σ2x is referred to as Relative Reso-
lution (RRES) and REL/σ2x is called Relative Reliability (RREL). Perfect forecasts
take on RRES = 1, and RREL = 0.
3.2.4 Likelihood-Base Rate Measures
Given a specific discrete observation x (i.e., the event occurs or it does not),
certain aspects of the distribution of the probabilistic forecasts f are desirable. The
likelihood-base rate factorization, which conditions on the observations, can be used
17
to explore these aspects of forecast quality.
Discrimination describes the degree to which the forecasts differ for a specific
observation x (x = 1, or x = 0). Forecasts with larger differences (higher discrimi-
nation) are more desirable. One measure of the discrimination is:
DIS = Ex(µf |x − µx)2 (3.14)
where Ex is the expected value with respect to the distribution of the observations
and µf |x is the expected value of the forecasts given an observation.
In the same way as CR decomposition, the connection between the discrimi-
nation of forecasts, and the MSE (or skill) can be seen through likelihood-base rate
(LBR) decomposition:
MSELBR(f, x) = σ2f + Ex(µf |x − x)2 −DIS. (3.15)
The first term in the decomposition measures the sharpness of the forecasts. The
second term is a measure of the bias of the forecasts conditioned on the observation,
which is called the Type II conditional bias:
TY2 = Ex(µf |x − x)2. (3.16)
The Sharpness, σ2f , is a measure of the degree to which the probability forecasts
approach 0 and 1. Higher sharpness indicates more confidence in the forecast out-
come.
Substituting the LBR decomposition into SS (Equation 3.7) gives
SS = 1 +DIS
σ2x
− σ2f
σ2x
− TY2
σ2x
. (3.17)
Again, DIS/σ2x, σ2
f/σ2x, and TY2/σ2
x are refered to as Relative Discrimination (RDIS),
Relative Sharpness (RS), and Relative Type 2 Conditional Bias (RTY2). A perfect
forecast system takes on RDIS = 1, RS = 1, and RTY2 = 0.
3.3 Estimation of Measures
This research aims to verify the forecasting system using observations, which
have Bernoulli distribution described in Section 2.2, and the forecasts expressed in
probability. In practice, the forecast is often issued as discrete set of values. Discrete
forecast may be used due to the accuracy of the physical model or measurement, as
an effort to reduce the cost to record data (forecasts), or because of the difficulty in
verification related to the dimensionality. However, to discretize a forecast, which is
issued as a continuous number for the sake of verification, provokes questions. How
18
does the discretization, especially with a small sample size, affect the measures
of forecast quality? How much information on the forecasting system is lost or
distorted by discretization? To answer these questions, the comparison of the results
by discretizing forecasts and dealing with the forecasts as continuous numbers will
be conducted in Chapter 4. In this section the estimators for the above DO measures
are presented in order that they can be dealt with in a continuous manner.
3.3.1 Basic Statistics
Certain conditions for estimators of forecast quality aspects are desirable.
Estimators that are unbiased with low variance would be best. Given the small
sample sizes, estimators that utilize the entire sample of forecasts and/or observa-
tions would be better than those based on conditional subsamples. If conditional
subsamples are used, estimators for low-order moments would be better than those
for higher-order moments. Based on theses considerations, the basic moments cho-
sen to be estimated are the mean and variance of the observations, µx and σ2x, the
mean and variance of the forecasts, µf and σ2x, and the conditional means of the
forecasts given the observations, µf |x=0 and µf |x=1. The four quantities related to
the marginal distributions can be estimated with the entire sample:
µx =1
N
N∑
i=1
xi (3.18)
σ2x =
1
N − 1
N∑
i=1
x2i −
N
N − 1µ2
x (3.19)
µf =1
N
N∑
i=1
fi (3.20)
σ2f =
1
N − 1
N∑
i=1
f 2i −
N
N − 1µ2
f (3.21)
Then, since the observations are Bernoulli random variates,
t(x = 1) = µx (3.22)
t(x = 0) = 1− µx. (3.23)
Next, divide the pairs (fi, xi), i = 1, · · · , N into two sets; one set has x = 0
(A) and another has x = 1 (B). Denote NA and NB the numbers of pairs included
in A and B, respectively.
19
µf |x=0 =1
NA
NA∑
i=1
fi, (fi, xi) ∈ A (3.24)
µf |x=1 =1
NB
NB∑
i=1
fi, (fi, xi) ∈ B (3.25)
Note that these two estimators can have significant uncertainty when the subsample
sizes are small.
The reason why the above statistics are called basic is that most forecast
quality measures (except CR decompositions) can be derived from these statistics.
To do this is beneficial in that the propagation of uncertainty could be seen more
easily with limited estimators.
3.3.2 Other Derivative Estimators
First, estimators for the basic performance measures of forecast quality (namely,
ME, MSE, and SS) are expressed with the basic estimators µx, µf , σ2x, σ2
f , µf |x=0,
and µf |x=1 discussed in the former section. ME (see Section 3.2) is simply:
ME = µf − µx (3.26)
The measure of the accuracy is also estimated with basic estimators:
MSE(f, x) =∑
x∈0,1
∫ 1
0(f − x)2p(f, x)df
=∑
x∈0,1
∫ 1
0(f 2 − 2fx + x2)p(f, x)df
=∫ 1
0f 2q(x = 0|f)s(f)df +
∫ 1
0f 2q(x = 1|f)s(f)df
−2∫ 1
0f1r(f |x = 1)t(x = 1)df +
∫ 1
012r(f |x = 1)t(x = 1)df
=∫ 1
0f 2s(f)df − 2t(x = 1)
∫ 1
0fr(f |x = 1)df + t(x = 1)
∫ 1
0r(f |x = 1)df
= (σ2f + µ2
f )− 2µxµf |x=1 + µx (3.27)
Then, the MSE Skill Score (SSMSE) is estimated by Equation (3.7). To derive
the estimator for the correlation coefficient (CC), also called the potential skill or
association, we first manipulate:
E[fx] =∑
x∈0,1
∫ 1
0(fx)p(f, x)df
20
=∫ 1
0fr(f |x = 1)t(1)df
= µxµf |x=1. (3.28)
Then the CC can be written as:
ρfx =E[fx]− E[f ]E[x]√
σ2fσ
2x
=µxµf |x=1 − µfµx√
σ2fσ
2x
. (3.29)
Next, the Type 2 conditional bias and discrimination from LBR decomposition
can be derived as:
Ex[(µf |x − x)2] =∑
x∈0,1
(µf |x − x)2t(x)
= (1− µx)µ2f |x=0 + µx(µf |=1 − 1)2 (3.30)
Ex[(µf |x − µf )2] =
∑
x∈0,1
(µf |x − µf )2t(x)
= (1− µx)(µf |x=0 − µf )2 + µx(µf |=1 − µf )
2 (3.31)
On the other hand, as mentioned before, the Reliability (REL) and Resolution
(RES) from the CR decomposition require information about the marginal distri-
bution s(f), since:
Ef [(µx|f − f)2] =∫ 1
0(µx|f − f)2s(f)df
=∫ 1
0(µ2
x|f − 2fµx|f + f 2)s(f)df
=∫ 1
0µ2
x|fs(f)df − 2∫ 1
0fr(f |x = 1)t(1)df
+∫ 1
0f 2s(f)df
=∫ 1
0µ2
x|fs(f)df − 2µxµf |x=1 + σ2f + µ2
f (3.32)
Ef [(µx|f − µx)2] =
∫ 1
0(µx|f − µf )
2s(f)df
=∫ 1
0(µ2
x|f − 2µxµf + µ2f )s(f)df
=∫ 1
0µ2
x|fs(f)df − 2µf
∫ 1
0r(f |x = 1)t(1)df
+µ2f
∫ 1
0s(f)df
21
=∫ 1
0µ2
x|fs(f)df − 2µxµf + µ2f . (3.33)
Thus, the problem of estimation of CR decomposition measures boils down to one
of estimating∫ 10 µ2
x|fs(f)df .
3.3.3 Estimation of CR Decompositions
This subsection investigates the several ways to estimate the integral∫ 1
0µ2
x|fs(f)df. (3.34)
In the case where the forecasts are discretized, the integral becomes
M∑
j=1
µ2x|fj
s(fj), (3.35)
where M is the number of bins in which the forecasts are discretized. From this
equation, the natural estimator for the integral is the sample average of µ2x|fj
or:
1
N
N∑
i=1
µ2x|fi
. (3.36)
Therefore, the simplest estimation of the integral is to use regression to estimate
µx|f from the set of pairs (fi, xi).
This approach fills the demand from the computation of CR decomposition
measures. However, the marginal distribution s(f) is informative and indispensable
in terms of the philosophy of DO approach. The marginal distribution s(f) can be
expressed by the marginal and conditional distribution of the LBR factorization as:
s(f) = t(0)r(f |x = 0) + t(1)r(f |x = 1). (3.37)
The conditional distribution of the observations given the forecasts q(x|f) can also
be written as:
q(x|f) =t(x)r(f |x)
s(f)
=t(x)r(f |x)
t(0)r(f |x = 0) + t(1)r(f |x = 1), (3.38)
and then, the expected value of observations conditional on the forecasts is:
µx|f =∑
x∈0,1
xq(x|f) = q(x = 1|f)
22
=t(1)r(f |x = 1)
s(f). (3.39)
Thus, the estimation of conditional distribution r(f |x) is another way to estimate
s(f), µx|f and the integral.
From the above discussion, the following three methods to estimate the inte-
gral are considered:
1. The Logistic Regression Method (LRM) estimates the conditional mean µx|fby logistic regression, and utilize Equation (3.36). Logistic regression is a
suitable model for the situation where response variables are binary. This
approach directly estimates µx|f by applying the logistic regression to the
pairs of observations and forecasts. In essence, estimated is the conditional
distribution q(x = 1|f) and q(x = 0|f) (= 1− q(x = 1|f)). Then, the integral
is obtained by equal weighting of µx|fi, i.e., Equation (3.36).
2. The Kernel Density estimation Method (KDM) estimates the conditional dis-
tribution r(f |x) by kernel density estimation method. Equations (3.37) and
(3.39) are used for the numerical integration of Equation (3.34). Thus, this
approach uses the LBR factorization to reconstruct the joint distribution.
3. The Combination Method (CM) estimates the conditional mean µx|f by logistic
regression, and the marginal distribution s(f) by kernel density method; this
approach rebuilds the joint distribution through the CR factorization. Then,
Equation (3.34) is numerically integrated.
The reasons why the kernel density method is adopted are that (1) the error by
specifying a parametric model may be greater than the error that a nonparametric
model can produce due to small sample size, since one does not know a priori the
correct distributional model, and that (2) the kernel density method is motivated
by the limiting case of the averaged shifted histogram, which is a computationally
and statistically efficient density estimator (Scott 1992).
For the first approach, the marginal distribution s(f) is estimated by kernel
estimation density method. Although the estimation of µx|f in Equation (3.36) is
enough to obtain the CR decomposition estimates, the marginal distribution s(f)
graphically indicates the important aspect of sharpness of the forecasts. The con-
ditional distribution r(f |x), related to a measure of discrimination, is also obtained
by
23
r(f |x = 1) =s(f)µx|ft(x = 1)
(3.40)
r(f |x = 0) =s(f)(1− µx|f )
t(x = 0). (3.41)
The above methods to estimate CR decompositions are examined with the
traditional method that discretizes (or bins) probabilistic forecasts into 11 or 12
bins, referred as to DSC, in Chapter 4. The technical methods used for LRM,
KDM, CM, and DSC are described in Appendix A.
3.4 Example of Verification
The verification of June-September seasonal volume forecasts is used as an
example to illustrate the proposed verification approach. The forecasts were pro-
duced by the experimental forecasting system based on the Extended Streamflow
Prediction concept. The forecasting system issues the probability distribution fore-
cast at Stratford, Iowa, on every June 1st. By considering a event that the seasonal
volume Y is less than or equal to a threshold yp, the probabilistic forecast is then
derived from the probability distribution forecast. The corresponding continuous
observation is discretized into 0 for no occurrence of the event, or 1 for occurrence
of the event. Since the forecast is issued from 1949 to 1996, 48 pairs of probabilistic
forecasts and discrete observations make one verification dataset. Nine quantiles yp
are used as thresholds, so the probability distribution forecasts are assessed overall
through nine verification datasets.
The following examines the measures of forecast quality, and distributions
of forecasts and observations that were introduced in Section 3.2. The integral
necessary for computing CR decompositions is estimated by LRM (see Section A
for details). The measures of forecast quality are depicted against the nonexceedance
probability p corresponding to the threshold yp, which can be simply thought of as
the magnitude of flow event.
3.4.1 Absolute and Relative Measures
First, absolute measures are examined. Mean Error (ME) in the left of Fig-
ure 3.1 shows that the forecasting system tends to underestimate the occurrence
of low flow events, and overestimate the occurrence of moderate and high flow
24
-0.1
-0.05
0
0.05
0.1
0 0.2 0.4 0.6 0.8 1
Mea
n E
rror
Nonexceedance probability p
June-September Flow
0
0.05
0.1
0.15
0.2
0.25
0 0.2 0.4 0.6 0.8 1
MS
E
Nonexceedance probability p
June-September Flow
Figure 3.1: Mean Error (ME) and Mean Square Error (MSE) for June-Septemberseasonal volume forecasts.
events. Mean Squared Error (MSE) in the right of Figure 3.1 indicates the down-
ward concave shape; the low and high flow events have better absolute accuracy
than moderate flow event. Figure 3.2 shows the CR and LBR decompositions of
Mean Squared Error (MSE) for June-September seasonal volume forecasts. The
estimated Uncertainty almost fits to the analytical solution σ2x = p(1 − p). The
Reliability is small and constant, whereas moderate events have more Resolution.
The LBR decompositions have greater values for moderate flow events.
Since the magnitude of the absolute measures strongly depend on p, one should
not evaluate the forecasting system over the range of outcomes using these absolute
measures alone. Relative measures compare the forecasts for each event with the
climatology forecast µx. The CR decompositions of MSE Skill Score are shown
in the upper left of Figure 3.3. Examination of the MSE Skill Score indicates
that the forecasts for moderate flow events have more skill than those for extreme
flow events. Relative Reliability (RREL) shows that the reliability is almost the
same over the range of outcomes, in sense of contribution to the Skill Score, while
Relative Resolution (RRES) indicates that the forecasts for p =0.25 have the highest
resolution. The LBR decompositions of the MSE Skill Score are shown in the upper
right of Figure 3.3. The low and high flow events have more Relative Type 2
Conditional Bias (RTY2), which cannot be seen from absolute Type 2 Conditional
Bias (TY2). The magnitude is much larger than other terms. According to Relative
Sharpness (RS) and Relative Discrimination (RDIS), the forecasts for moderate flow
events have more sharpness and discrimination than those for the high and low flow
events. Examination of the lower left of Figure 3.3 illustrates that the potential
25
0
0.05
0.1
0.15
0.2
0.25
0 0.2 0.4 0.6 0.8 1
Unc
erta
inty
Mea
sure
Nonexceedance probability p
June-September Flow
0
0.05
0.1
0.15
0.2
0.25
0 0.2 0.4 0.6 0.8 1
Sha
rpne
ssNonexceedance probability p
June-September Flow
0
0.05
0.1
0.15
0.2
0.25
0 0.2 0.4 0.6 0.8 1
Rel
iabi
lity
(Typ
e 1
Con
ditio
nal B
ias)
Nonexceedance probability p
June-September Flow
0
0.05
0.1
0.15
0.2
0.25
0 0.2 0.4 0.6 0.8 1
Type
2 C
ondi
tiona
l Bia
s)
Nonexceedance probability p
June-September Flow
0
0.05
0.1
0.15
0.2
0.25
0 0.2 0.4 0.6 0.8 1
Res
olut
ion
Nonexceedance probability p
June-September Flow
0
0.05
0.1
0.15
0.2
0.25
0 0.2 0.4 0.6 0.8 1
Dis
crim
inat
ion
Nonexceedance probability p
June-September Flow
Figure 3.2: CR (on left) and LBR (on right) decompositions of MSE for June-September seasonal volume forecasts.
26
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e M
easu
re
Nonexceedance probability p
June-September Flow
SSRRESRREL
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e M
easu
re
Nonexceedance probability p
June-September Flow
SSRDISRSRTY2
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e M
easu
re
Nonexceedance probability p
June-September Flow
SSPotential SkillReliability MeasureUnconditional Bias Measure
Figure 3.3: Various decompositions of MSE Skill Score for June-September seasonalvolume forecasts. The upper left indicates CR decompositions, Relative Resolution(RRES) and Relative Reliability (RREL). The upper right indicates LBR decom-positions, Relative Discrimination (RDIS), Relative Sharpness (RS), and RelativeType 2 Conditional Bias (RTY2). The lower left shows Potential Skill, ReliabilityMeasure, and Unconditional Bias Measure.
skill of extreme low flow events is lower than that of high extreme flow events.
3.4.2 Marginal and Conditional Distributions
The marginal and conditional distributions of forecasts and observations pro-
vide more details of forecast quality than scalar measures shown above. The main
diagrams to display these distributions are called Reliability diagrams and Discrim-
ination diagrams (Murphy 1997, or Wilks 1995).
Figure 3.4 shows the Reliability diagram consisting of the marginal distribu-
tion s(f) and conditional distribution q(x = 1|f) = µx|f . The marginal distribution
s(f) is estimated by kernel density estimation method, and the conditional distri-
bution µx|f is obtained by fitting a logistic regression. The marginal distribution
s(f) indicates how sharp (or confidence) this forecasting system is. No density is
27
on f = 1, and most of density concentrates near f = 0. Since perfect forecasting
systems have mass points at 0 and 1, this system’s forecast are not very sharp.
According to Reliability diagram, Resolution measures the distance between sam-
ple points and the line µx|f = µx (no resolution), whereas Reliability measures the
distance between sample points and the line µx|f = f (perfect reliability). It can
be seen that the forecasting system tends to overestimate the occurrence of events,
when it issues the probabilistic forecasts less than about 0.37. For the probabilistic
forecasts more than that, it underestimates the occurrence of events. Comparison
with no resolution line indicates that this system has good Resolution. From Equa-
tion (3.13) the straight line between the line µx|f = f and the line µx|f = µx is called
the no-skill line. Subsamples of forecasts contribute to the Skill Score positively if
the corresponding points (µx|f , f) lie to the right (left) of the vertical line f = µx
and above (below) the no-skill line (see Murphy 1997). The negative contribution
to the Skill Score occurs around the intersection of the lines µx|f = f and µx|f = µx.
The Discrimination diagram is made of the conditional distributions of fore-
casts given observations r(f |x), and marginal distributions of observations t(x) (Fig-
ure 3.5). From Equation 3.40 and 3.41, the conditional distributions r(f |x = 0) and
r(f |x = 1) are estimated. The modes for r(f |x = 0) and r(f |x = 1) should be lo-
cated on f = 0 and f = 1 for perfect forecasting systems. Therefore, the forecasts
when events occur (x = 1) are worse than those when events do not occur.
3.5 Discussion
Among the various forecast aspects, the measures of the accuracy or skill of
forecast quality have been studied most. In general, most of the measures of skill (or
accuracy) are categorized into the following four groups (Zhang and Casey 2000):
1. Those that directly measure the differences between forecasts and observations
(Root Mean Squared Error (RMSE) or Brier score). In addition, Murphy
(1995) identified the above measures as a “squared-error approach”, and a
linear measure of correspondence such as Mean Absolute Error as “linear-
distance approach”.
2. Those that measure the differences between forecasts and observations in cu-
mulative probability space (the ranked probability score (RPC), the Linear
Error in Probability Space score (LEPS)). Murphy (1995) termed this as
“linear-error-in-probability-space approach”.
28
0
0.5
1
1.5
2
2.5
3
3.5
Re
lative
fre
qu
en
cy [
s(f
)]
s(f) for p=0.25
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Ob
se
rve
d r
ela
tive
fre
qu
en
cy (
mx|f)
Forecast probability (f)
No resolution
No skillPe
rfect
relia
bilit
y
Figure 3.4: Reliability diagram for June-September seasonal volume forecasts issuedfor 0.25 quantile.
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 0.2 0.4 0.6 0.8 1
Like
lihoo
d [r(
f|x)]
Forecast probability (f)
r(f|x) for p=0.25
t(x=0)=0.771t(x=1)=0.229
x=0x=1µx
Figure 3.5: Discrimination diagram for June-September seasonal volume forecastsissued for 0.25 quantile.
29
3. Those based on concepts derived from signal detection theory (SDT), produce
various measures from the ratios of relative “signal” and “noise” expressed by
the conditional distributions of forecasts given observations (Relative Operat-
ing Characteristics (ROC)).
4. Those based on converting probability forecasts to binary forecasts and the
generation of a contingency table from the hit and miss rates.
Murphy (1997) also showed similar classification. Most researchers use the ensemble
spread and rank histograms to characterize the ensembles themselves and use CR
decomposition, Brier score, RPS, or ROC for the probability forecasting (Hamill
and colucci 1997; Hou et al. 1998 etc.). It is desirable that a number of different
scoring techniques should be applied in order to obtain an objective assessment
of any given forecast scheme, instead of using just one scoring scheme (Murphy
1991; Zhang and Casey 2000). For example, the combination of forecasts from
different models give better mean-square errors by proper weighting, although the
LEPS, which gives higher scores and less penalty for forecasting rare events, does
not always show better results (Zhang and Casey 2000). But it is also important to
figure out which scoring methods are more stable to the effects of sample variability
and desirable for small samples.
On the other hand, new scores for EPS are still under development. Wilson
et al. (1999) introduced a new score that is in the form of probabilities of occur-
rence of the observation given the EPS distribution. That is, this measure seeks
to assess the ensemble outputs in terms of probability. This measure would be
useful to see how the bias correction methods discussed in Chapter 5 change the
distribution of ensemble traces. Hersbach (2000) showed that for an EPS the con-
tinuous ranked probability score (CRPS) can be decomposed into a reliability part
and a resolution/uncertainty part, in a way that is similar to the decomposition of
the Brier score. The reliability is closely related to the rank histogram, and sen-
sitive to the width of the ensemble bins. The resolution expresses the superiority
of a forecast system with respect to a forecast system based on climatology. The
uncertainty/reliability part was found to be both sensitive to the average spread
within the ensemble, and to the behaviour of the outliers (Hersbach, 2000). Since
the CRPS can be interpreted as the integral of the Brier score over all possible
threshold values (Hersbach, 2000), the use of CRPS would be another approach
to verification of whole the probability distribution forecast. In this case, another
30
decomposition corresponding to LBR decomposition of the Brier score would have
to be derived in order to look at forecast quality given observations. Still, since
the CRPS is just one scalar variable, the information on how forecast quality varies
over the range of possible outcomes may not be obtained. Stensrud and Wandishin
(2000) extended the critical success index to measure the agreement between two or
more spatial distribution of ensemble forecasts and the observation, which is called
the correspondence ratio.
3.6 Summary and Conclusions
All the measures of the distributions-oriented (DO) approach are derived from
the joint distribution of forecasts and observations. These measures can be examined
over the range of thresholds for which the forecasts are issued. The joint distribution
can be examined from calibration refinement (CR) factorization and likelihood-
based rate (LBR) factorization.
Measures of unconditional bias and accuracy take well-known forms; uncondi-
tional bias is defined by the mean error (ME), and accuracy is Mean Square Error
(MSE) between forecasts and observations. As a relative measure of accuracy, the
MSE Skill Score is introduced. It uses climatology forecasts, or mean of obser-
vations, as the reference. Decomposition of MSE gives other aspects of forecast
quality. CR decompositions, which are Reliability and Resolution, are conditioned
on the forecasts. Reliability is a measure of conditional bias given a forecast. Res-
olution measures how much the outcomes given probabilistic forecasts are different
from the climatology (or the mean of observations). Thus, smaller Reliability and
larger Resolution are more desirable. LBR decompositions, which are Sharpness,
Type 2 Conditional Bias and Discrimination, are conditioned on the observations.
Sharpness is just the variance of forecasts, which is important especially for forecasts
issued in probability. Sharpness measures how much issued forecasts are distinct,
which reflects confidence of forecasters. Type 2 Conditional Bias and Discrimination
are based on the same concepts as Reliability and Resolution. Type 2 Conditional
Bias is a measure of conditional bias given an observation. Discrimination measures
how much the forecasts when events occurred are different from those when events
did not occur. Therefore, smaller Type 2 Conditional Bias, and larger Discrimina-
tion and Sharpness are better. In order to compare these five decompositions over
the magnitude of possible outcomes, they are normalized by Uncertainty, or variance
of observations. The normalized values are referred to as “Relative” measures.
31
In practice, forecasts are often issued in discrete numbers due to various rea-
sons. However, the discretization, especially with small sample size, may affect the
measures of forecast quality, or information on the forecasting system may be lost
or distorted by discretization. This chapter derived estimators for the above DO
measures in order that they can be dealt with in a continuous manner. It turned
out that all the measures except for CR decompositions can be expressed by six ba-
sic statistics, without any assumption on mathematical form of distribution. Then,
the problem of estimation of CR decomposition boils down to the estimation of
the integral∫ 10 µ2
x|fs(f)df , where µx|f is the conditional mean of observations given
forecasts, and s(f) denotes the marginal distribution of forecasts.
Three statistical methods to estimate the integral were explained in detail.
The logistic regression method (LRM) estimates the conditional mean µx|f by lo-
gistic regression, and the integral is estimated by the sample average of µ2x|f . The
kernel density estimation method (KDM) estimates the conditional distribution of
forecasts given observations r(f |x) by the kernel density estimation method. From
these distributions, the marginal distribution of forecasts s(f) and the conditional
distribution of observations given forecasts q(x = 1|f) are computed. The integral
is then numerically integrated using these distributions. The combination method
(CM) estimates the conditional mean µx|f by logistic regression, and the marginal
distribution s(f) by kernel density method, and then integrates the integral nu-
merically. The traditional discrete approach with contingency table (DSC) is also
considered. In general, if forecasts and observations are divided into I and J bins,
the dimensionality D is defined as D = I × J − 1. In case of the contingency table
that divides probabilistic forecasts into I = 11 bins from 0 to 1 with 0.1 interval,
the dimensionality is D = 11× 2− 1 = 21, since the observations are dichotomous
(J = 2). These three continuous approaches, LRM, KDM, and CM, reduce the
dimensionality to 9, 7, and 9, respectively.
32
CHAPTER 4
DISTRIBUTIONS-ORIENTED METHODSFOR SMALL VERIFICATION DATASET
The distributions-oriented (DO) approach based on the joint distribution of
forecasts and observations is superior to the measures-oriented (MO) approach for
forecast verification. However, when applying the DO approach to hydrological
forecasts, which typically have a small verification dataset, it is difficult to estimate
the joint distribution and DO measures properly. This chapter examines three
statistical methods to reduce the estimation uncertainty of DO measures. Three
forecasting systems are developed to produce verification datasets, to which the
three statistical methods are applied. This chapter describes and discusses each
forecasting model and the verification results.
4.1 Introduction
The distributions-oriented (DO) approach gives structure to the verification
process, and some idea of what aspects the forecasts are good or bad based on
the joint distribution of the forecasts and corresponding observations. In essence,
applying the DO approach to real verification problems is equivalent to estimating
the joint distribution of forecasts and observations.
The dimensionality D, one of the characteristics of verification problem, is
defined as the number of degrees of freedom in estimating the joint distribution.
Since available samples for hydrological variables are very limited, lower dimension-
ality is more desirable. In some applications, reduction of dimensionality has been
carried out. For instance, Brooks et al. (1996) dealt with the temperature forecasts
produced by Model Output Statistics (MOS). In order to reduce the dimensionality,
they chose to verify forecasts and observations in the context of day-to-day tem-
perature change. The forecasts and observations were binned into 5F intervals. As
a result, they succeeded in reducing the dimensionality from D = 389016 to 120.
However, reducing the number of categories for forecasts could result in losing or
distorting information of the original forecasting system. Thus, not changing the
original forecasts but applying a parsimonious statistical model to the conditional
and/or unconditional distributions may be more reasonable and effective (Murphy,
33
1991). Murphy and Wilks (1998) modeled the conditional distributions q(x|f), or
µx|f , with a linear regression equation, and the marginal distribution s(f) with a
beta distribution, to reduce the dimensionality of the underlying verification prob-
lem from D = 11 × 2 − 1 = 21 to 4. This research also makes use of statistical
models to reduce dimensionality.
As described in Section 3.3, the measures of forecast quality, except for CR
decompositions, can be estimated from six basic moment estimators, without any
assumptions on the conditional or marginal distributions. The estimation of CR
decompositions, however, requires the estimation of the integral∫ 10 µ2
x|fs(f)df . In
order to estimate the integral, three statistical methods are utilized (Subsection
3.3.3). The logistic regression method (LRM) estimates the conditional mean µx|fby logistic regression; the integral is estimated by the arithmetic average of µ2
x|f for
the set of forecasts f . The kernel density estimation method (KDM) estimates the
conditional distribution of forecasts given observations r(f |x) using kernel density
estimation. From these distributions, the marginal distribution of forecasts s(f) and
conditional distribution of observation given forecasts q(x = 1|f) are computed. The
integral is then numerically integrated using these distributions. The combination
method (CM) estimates the conditional mean µx|f by logistic regression, and the
marginal distribution s(f) by kernel density method, and then integrates the integral
numerically. This research calls the above methods, where statistical models are
used to construct the joint distribution of forecasts and observations, a continuous
approach, as opposed to the discrete approach based on the traditional contingency
table.
The key questions from the above discussion include:
1. Are continuous approaches for the joint distribution better than the discrete
approach?
2. When is one of the continuous approaches better than the others?
3. How does the performance depend on the nature of the forecasts?
To answer these questions, this chapter examines the three continuous approaches by
evaluating the CR decompositions (or Reliability (REL) and Resolution (RES)) for
many verification datasets. In addition, four other measures, Mean Square Error
(MSE), Mean Error (ME), Type 2 Conditional Bias (TY2), and Discrimination
(DIS), are compared to evaluate the estimation error quantitatively. These aspects
34
of forecast quality are normalized by the theoretical uncertainty (or the variance of
observations σ2x = µx(1−µx)), so that the results can be compared over the different
flow events for which forecasts are issued.
The verification datasets are generated by three different models. The first
two models produce continuous forecasts, and the last one issues discrete forecasts.
In the first investigation, beta distributions are assumed to represent the condi-
tional distribution r(f |x) (analytical model for the joint distribution), so that the
true measures of forecast quality are obtained analytically. Second, as a practical
case, verification datasets are produced by a stochastic forecasting model that repre-
sents monthly streamflow volume (stochastic model of streamflow forecast). Then,
the true forecast quality are assumed to be those obtained by the contingency-table
method (DSC) with 1 million pairs of forecasts and observations. The third analysis
uses a discrete forecasting model that issues forecasts in discrete numbers directly
(discrete joint distribution model). Similarly, the true values of the DO measures
are calculated by DSC. Therefore, this analysis investigates how effective the three
continuous approaches are compared to the discrete forecasting model. Each inves-
tigation looks at verification datasets with 50, 100, 200, 400, 600, 800, and 1,000
forecast-observation pairs. For each sample size, 1,000 verification datasets are gen-
erated by Monte Carlo methods to evaluate the estimation uncertainty of forecast
quality measures.
4.2 Monte Carlo Simulation withAnalytical Model for Joint Distribution
Along with dichotomous observations, probabilistic forecasts are generated
in continuous numbers by beta distributions, which are fitted to the conditional
distribution of the forecasts r(f |x) given the observations. The fitting of a beta
distribution facilitates obtaining the true CR decompositions. The resultant CR
decompositions by the three different continuous approaches (LRM, KDR, and CM)
and 11-binned discrete approach (DSC) are compared and discussed.
4.2.1 Assumptions and Procedure
The random variable X, the discretized observation, has a Bernoulli distri-
bution, while F , the continuous or discretized forecast, has unknown distribution.
35
Thus, from the LBR factorization (Equation (3.3)), once the conditional distribu-
tions r(f |x = 0) and r(f |x = 1) are specified, any forecast can be generated given
the marginal probability t(x). The procedure to generate the verification dataset
using beta distributions for the conditional distributions r(f |x) is:
1. Generate the Bernoulli variate x.
2. If the generated observation x is 0, generate the corresponding forecast f
based on the conditional distribution r(f |x = 0):
r(f |x = 0) =
1B(α0,β0)
fα0−1(1− f)β0−1 for 0 < f < 1, α0 > 0, β0 > 0,
0 otherwise.(4.1)
Similarly, in the case of x = 1, use the conditional distribution r(f |x = 1)
with α1 and β1 in the above equation.
The four parameters of beta distribution αi, and βi (i = 0, 1) are chosen so that the
forecasts are unconditionally unbiased and have positive MSE Skill Score (see Sec-
tion 3.2 for these definitions), based on repeated trial and error. First, conditional
means of forecasts given observations µf |x=0 and µf |x=1 are chosen to satisfy:
µf |x=0t(0) + µf |x=1t(1) = µf = µx,
and then the βi are obtained by substituting a chosen αi into:
βi =αi(1− µx|f=i)
µx|f=i
. (4.2)
The true measures, except CR decompositions, are calculated using the ex-
pressions derived in Subsection 3.3.3. To calculate the true CR decompositions,
at first the marginal distribution of forecasts s(f), the conditional distribution of
observations given a forecast, and the mean of observations given a forecast are cal-
culated from Equations (3.37), (3.38), and (3.39). Then, the integral of CR decom-
positions, Equation (3.34), is numerically integrated by Equation (A.10). Finally,
the numerically integrated value is substituted into Equations (3.32) and (3.33) to
obtain Reliability and Resolution.
Two cases, a moderate case, where t(x = 1) = 0.25, and an extreme case,
where t(x = 1) = 0.05, are considered. For example, the case t(x = 1) = 0.25
corresponds to forecasts for an event that a volume is less than or equal to 0.25
quantile of observations. The four parameters of beta distributions and true forecast
quality aspects are listed in Table 4.1.
36
4.2.2 Result and Discussion
The box plots of the four forecast quality aspects that do not require any
assumption to estimate are shown in Figures 4.1 and 4.2. There are small differences
caused by rounding error, because DSC estimates the aspects by reconstructing the
joint distribution first, i.e., calculating the 11 × 2 probabilities. The ranges of
maximum and minimum for 50 sample sizes are fairly large for these four aspects.
For example, in the case of p = 0.05, one may obtain about 0 (1.11) ≤ MSE/σ2x ≤
0.49 (2.46) with 25 % chance, which is equivalent to 0.51 (-1.46) ≤ MSE Skill Score
(SS) = 1 - MSE/σ2x ≤ 1 (-0.11). These are very different from the true MSE/σ2
x
= 0.80 (SS = 0.20), and may lead to wrong perception of skill of the forecasting
system. The difference between the top (upper quartile, Q3) and bottom (lower
quartile, Q1) of the box is referred to as interquartile range (IQR): IQR = Q3−Q1.
The decrease in the IQR indicates that the uncertainty decreases with the increase
of sample size.
In general, the Root Mean Squared Error (RMSE) is defined as:
RMSE =√
SD(θ)2 + BIAS(θ)2 (4.3)
Table 4.1: Parameters of beta distributions for the analytical model and true fore-cast quality measures.
t(x = 1) t(x = 1)
0.25 0.05 0.25 0.05
α0 1.0 0.25 MSE/σ2x 0.478 0.801
β0 5.667 6.0 ME/σx 0.000 0.000
α1 3.0 0.6 TY2/σ2x 0.360 0.640
β1 2.455 1.9 DIS/σ2x 0.160 0.040
µf |x=0 0.15 0.04 REL/σ2x 0.086 0.016
µf |x=1 0.55 0.24 RES/σ2x 0.608 0.215
σ2x 0.1875 0.0475
37
0
0.2
0.4
0.6
0.8
1
50 100 200 400 600 800 1000
MS
E/σ
x2
Sample size
MSE/σx2 versus sample size
D C D C D C D C D C D C D C
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
50 100 200 400 600 800 1000
(µx
- µf)/
σ x
Sample size
ME/σx versus sample size
D C D C D C D C D C D C D C
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
50 100 200 400 600 800 1000
Ex(
µ f|x
- x)
2 /σx2
Sample size
TY2/σx2 versus sample size
D C D C D C D C D C D C D C0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
50 100 200 400 600 800 1000
Ex(
µ f|x
- µ f
)2 /σx2
Sample size
DIS/σx2 versus sample size
D C D C D C D C D C D C D C
Figure 4.1: MSE/σ2x, ME/σx, TY2/σ2
x, and DIS/σ2x estimated by two approaches
for nonexceedance probability p = 0.25; “D” is discretized (11-binned) approach(DSC), “C” represents a continuous approach such as LRM, KDM, and CM. Themaximum, upper quartile, median, lower quartile, and minimum are indicated fromtop to bottom. The forecasts are produced by the analytical model.
38
0
0.5
1
1.5
2
2.5
50 100 200 400 600 800 1000
MS
E/σ
x2
Sample size
MSE/σx2 versus sample size
D C D C D C D C D C D C D C -0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
50 100 200 400 600 800 1000
(µx
- µf)/
σ x
Sample size
ME/σx versus sample size
D C D C D C D C D C D C D C
0
0.5
1
1.5
2
2.5
50 100 200 400 600 800 1000
Ex(
µ f|x
- x)
2 /σx2
Sample size
TY2/σx2 versus sample size
D C D C D C D C D C D C D C 0
0.1
0.2
0.3
0.4
0.5
0.6
50 100 200 400 600 800 1000
Ex(
µ f|x
- µ f
)2 /σx2
Sample size
DIS/σx2 versus sample size
D C D C D C D C D C D C D C
Figure 4.2: MSE/σ2x, ME/σx, TY2/σ2
x, and DIS/σ2x estimated by two approaches
for nonexceedance probability p = 0.05; “D” is discretized (11-binned) approach(DSC), “C” represents a continuous approach such as LRM, KDM, and CM. Themaximum, upper quartile, median, lower quartile, and minimum are indicated fromtop to bottom. The forecasts are produced by the analytical model.
39
where θ is an estimator for θ, SD denotes the standard deviation√
E[(θ − E[θ])2],
and BIAS is the unconditional bias E[θ] − θ. Tables 4.2 and 4.3 show the RMSE
for MSE/σ2x, ME/σx, TY2/σ2
x, and DIS/σ2x. In relative sense, the measures for
the extreme case have more sampling uncertainty than the moderate case. Also,
MSE/σ2x (or MSE Skill Score) has the largest uncertainty among these measures.
Figures 4.3 and 4.4 show the true value of the conditional mean of observations
µx|f , the marginal distribution of forecasts s(f), and the conditional distributions
of forecasts given observations r(f |x), for the moderate flow event p = 0.25. Also,
the error bar of these distributions for sample size 50 are shown. It can be seen that
DSC has more uncertainty in µx|f and the expected value of estimated µx|f drops
near the forecast f = 1. On the other hand, the logistic regression of LRM and
kernel density estimation of KDM retain the proper structure. A more dramatic
failure in estimation can be seen for the case of extreme flow event forecasts (Figures
4.5 and 4.6).
Next, the box plots of CR decompositions, REL/σ2x and RES/σ2
x, obtained by
three continuous approaches and one discrete approach are discussed. Figure 4.7
shows REL/σ2x for the moderate flow event p = 0.25. KDM has the closest median to
the true value in the case of sample size 50, although it produces negative estimates.
Note that for the sample size 50 and 100, KDR and LRM give closer median to the
true than DSC does. After the sample size 400, DSC produces the closest median
to the true value. The median of RES/σ2x (Figure 4.7) has almost the same results
as REL/σ2x. The BIAS defined by Equation (4.3) for REL/σ2
x and RES/σ2x (Tables
B.1 and B.3) shows the same thing. All methods indicate a similar reduction in the
dispersion (IQR) as the sample size decreases, which can be seen in the SD defined
by Equation (4.3) (see Tables B.2 and B.4). CM performs worst among them with
the largest range of maximum and minimum for small sample sizes (less than 100
or 200).
The RMSE of REL/σ2x and RES/σ2
x for the moderate flow event p = 0.25 is
shown in Tables 4.4 and 4.5. KDM is the most efficient estimator of REL/σ2x until
the sample size reaches 200, followed by LRM. The result of REL/σ2x for the sample
size 50 is remarkable; KDM gives about half the RMSE of DSC, and the RMSE of
LRM is about 28 percent less than the one of DSC. On the other hand, there is
only a minor improvement by KDM and LRM in RES/σ2x.
The CR decompositions for the extreme case are discussed. Extreme flow
40
Table 4.2: Root Mean Squared Error (RMSE) in MSE/σ2x, ME/σx, TY2/σ2
x, andDIS/σ2
x for the forecasts generated for nonexceedance probability p = 0.25 by theanalytical model.
MSE/σ2x ME/σx TY2/σ2
x DIS/σ2x
50 1.079e-001 1.007e-001 9.600e-002 5.273e-002
100 7.546e-002 6.879e-002 6.670e-002 3.733e-002
200 5.244e-002 4.973e-002 4.657e-002 2.674e-002
400 3.786e-002 3.398e-002 3.267e-002 1.821e-002
600 3.065e-002 2.883e-002 2.689e-002 1.584e-002
800 2.655e-002 2.431e-002 2.342e-002 1.339e-002
1000 2.240e-002 2.120e-002 1.975e-002 1.213e-002
Note: these measures were calculated with the continuous approach.
Table 4.3: Root Mean Squared Error (RMSE) in MSE/σ2x, ME/σx, TY2/σ2
x, andDIS/σ2
x for the forecasts generated for nonexceedance probability p = 0.05 by theanalytical model.
MSE/σ2x ME/σx TY2/σ2
x DIS/σ2x
50 4.557e-001 1.237e-001 4.271e-001 8.096e-002
100 3.277e-001 9.326e-002 3.160e-001 5.049e-002
200 2.400e-001 6.688e-002 2.315e-001 3.456e-002
400 1.628e-001 4.587e-002 1.575e-001 2.245e-002
600 1.309e-001 3.714e-002 1.273e-001 1.824e-002
800 1.140e-001 3.185e-002 1.093e-001 1.629e-002
1000 1.034e-001 2.816e-002 9.919e-002 1.451e-002
Note: these measures were calculated with the continuous approach.
41
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by DSC
errorbarstrue
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
s(f)
Probablistic Forecast
s(f) with sample size 50 by DSC
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by LRM
errorbarstrue
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 0.2 0.4 0.6 0.8 1
s(f)
Probablistic Forecast
s(f) with sample size 50 by LRM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by KDM
errorbarstrue
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 0.2 0.4 0.6 0.8 1
s(f)
Probablistic Forecast
s(f) with sample size 50 by KDM
errorbarstrue
Figure 4.3: Conditional mean of the observations given the forecasts µx|f andmarginal distribution of the forecasts s(f) estimated by three methods, DSC, LRM,and KDM, for nonexceedance probability p = 0.25 with a sample size 50. The fore-casts are produced by the analytical model.
42
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
r(f|x
=0)
Probablistic Forecast
r(f|x=0) with sample size 50 by DSC
errorbarstrue
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
r(f|x
=1)
Probablistic Forecast
r(f|x=1) with sample size 50 by DSC
errorbarstrue
-1
0
1
2
3
4
5
6
0 0.2 0.4 0.6 0.8 1
r(f|x
=0)
Probablistic Forecast
r(f|x=0) with sample size 50 by LRM
errorbarstrue
-0.5
0
0.5
1
1.5
2
2.5
0 0.2 0.4 0.6 0.8 1
r(f|x
=1)
Probablistic Forecast
r(f|x=1) with sample size 50 by LRM
errorbarstrue
-1
0
1
2
3
4
5
6
0 0.2 0.4 0.6 0.8 1
r(f|x
=0)
Probablistic Forecast
r(f|x=0) with sample size 50 by KDM
errorbarstrue
-0.5
0
0.5
1
1.5
2
2.5
0 0.2 0.4 0.6 0.8 1
r(f|x
=1)
Probablistic Forecast
r(f|x=1) with sample size 50 by KDM
errorbarstrue
Figure 4.4: Conditional distribution of the forecasts given the observations r(f |x)estimated by three methods, DSC, LRM, and KDM, for nonexceedance probabilityp = 0.25 with a sample size 50. The forecasts are produced by the analytical model.
43
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by DSC
errorbarstrue
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
s(f)
Probablistic Forecast
s(f) with sample size 50 by DSC
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by LRM
errorbarstrue
0
5
10
15
20
0 0.2 0.4 0.6 0.8 1
s(f)
Probablistic Forecast
s(f) with sample size 50 by LRM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by KDM
errorbarstrue
0
5
10
15
20
0 0.2 0.4 0.6 0.8 1
s(f)
Probablistic Forecast
s(f) with sample size 50 by KDM
errorbarstrue
Figure 4.5: Conditional mean of the observations given the forecasts µx|f andmarginal distribution of forecasts s(f) estimated by three methods, DSC, LRM,and KDM, for nonexceedance probability p = 0.05 with a sample size 50. Theforecasts are produced by the analytical model.
44
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
r(f|x
=0)
Probablistic Forecast
r(f|x=0) with sample size 50 by DSC
errorbarstrue
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
r(f|x
=1)
Probablistic Forecast
r(f|x=1) with sample size 50 by DSC
errorbarstrue
0
5
10
15
20
0 0.2 0.4 0.6 0.8 1
r(f|x
=0)
Probablistic Forecast
r(f|x=0) with sample size 50 by LRM
errorbarstrue
0
5
10
15
20
0 0.2 0.4 0.6 0.8 1
r(f|x
=1)
Probablistic Forecast
r(f|x=1) with sample size 50 by LRM
errorbarstrue
0
5
10
15
20
0 0.2 0.4 0.6 0.8 1
r(f|x
=0)
Probablistic Forecast
r(f|x=0) with sample size 50 by KDM
errorbarstrue
0
5
10
15
20
0 0.2 0.4 0.6 0.8 1
r(f|x
=1)
Probablistic Forecast
r(f|x=1) with sample size 50 by KDM
errorbarstrue
Figure 4.6: Conditional distribution of the forecasts given the observations r(f |x)estimated by three methods, DSC, LRM, and KDM, for nonexceedance probabilityp = 0.05 with a sample size 50. The forecasts are produced by the analytical model.
45
-0.2
0
0.2
0.4
0.6
0.8
50 100 200 400 600 800 1000
Ef(µ
x|f -
f)2 /σ
x2
Sample size
REL/σx2 versus sample size
DL KC DL KC DL KC DL KC DL KC DL KC DL KC
0
0.2
0.4
0.6
0.8
1
1.2
1.4
50 100 200 400 600 800 1000
Ef(µ
x|f -
µx)
2 /σx2
Sample size
RES/σx2 versus sample size
DL KC DL KC DL KC DL KC DL KC DL KC DL KC
Figure 4.7: CR decompositions estimated by four approaches for nonexceedanceprobability p = 0.25; “D” is discretized (11-binned) approach (DSC), “L” is logis-tic regression (LRM), “K” is kernel density estimation directly applied to r(f |x)(KDM), and “C” is combination of logistic regression and kernel density estimation(CM). The maximum, upper quartile, median, lower quartile, and minimum areindicated from top to bottom. The forecasts are produced by the analytical model.
46
Table 4.4: Root Mean Squared Error (RMSE) in REL/σ2x for the forecasts generated
for nonexceedance probability p = 0.25 by the analytical model.
DSC LRM KDM CM
50 1.122e-001 8.061e-002 5.554e-002 1.401e-001
100 6.088e-002 4.868e-002 4.377e-002 8.796e-002
200 3.801e-002 3.463e-002 3.422e-002 6.226e-002
400 2.476e-002 2.561e-002 2.825e-002 4.538e-002
600 1.813e-002 2.036e-002 2.450e-002 3.751e-002
800 1.514e-002 1.859e-002 2.138e-002 3.407e-002
1000 1.428e-002 1.786e-002 1.979e-002 3.211e-002
Note: the underlined value is the smallest in the row.
Table 4.5: Root Mean Squared Error (RMSE) in RES/σ2x for the forecasts generated
for nonexceedance probability p = 0.25 by the analytical model.
DSC LRM KDM CM
50 1.792e-001 1.762e-001 1.632e-001 2.093e-001
100 1.184e-001 1.185e-001 1.155e-001 1.386e-001
200 8.680e-002 8.662e-002 8.543e-002 1.005e-001
400 6.142e-002 6.154e-002 6.286e-002 7.163e-002
600 5.112e-002 5.116e-002 5.455e-002 5.885e-002
800 4.220e-002 4.318e-002 4.534e-002 5.108e-002
1000 3.989e-002 4.008e-002 4.228e-002 4.732e-002
Note: the underlined value is the smallest in the row.
47
events are very important in water resource planning, because these events could
cause tremendous damages. LRM keeps producing closer estimates to the true
REL/σ2x than DSC for any sample sizes of 1000 or less (Figure 4.8). KDM is,
however, beaten by DSC at the sample size 200 or larger. CM also has closer median
to the true value than DSC for any sample size less than 1000 or less, although for
the sample size 50 and 100 it has extremely high maximums. The IQR in REL/σ2x
by LRM and CM is also slightly smaller than the one by DSC for the sample sizes of
1000 or less, while KDM have similar or larger IQR. Again, some negative estimates
by KDM are found in Reliability, while LRM and CM give positive ones. In this
point, LRM is more suitable as the estimator of Reliability and Resolution. As for
Resolution, LRM also performs better than DSC for sample sizes of 800 or less in
terms of the median. The reason why the KDR with p = 0.05 does not work well
may be the small sample to estimate r(f |x = 1) caused by dividing the samples.
For example, in case of a sample size 100 only about five subsamples of x = 1 is
generated.
Tables 4.6 and 4.7 show the RMSE of REL/σ2x and RES/σ2
x for the extreme
case p = 0.05. For the Reliability, LRM is the best estimator for sample sizes of
1000 or less. KDM is better than DSC until a sample size of 100, whereas CM
surpasses DSC after a sample size of 200. Even in the Resolution, the estimates by
LRM are the best before the sample size reaches 600.
From the above results, for the moderate flow event p = 0.25, the KDM
estimator is better than the DSC estimate until the sample size reaches 200. In case
of the extreme flow event p = 0.05, LRM is successful in reducing the uncertainty
compared to DSC. LRM is the best for the sample sizes of 1000 or less. Note that the
continuous approaches achieve better RMSE mostly by reducing the BIAS. For the
analytical model of the joint distribution used for Monte Carlo simulation, the true
distribution of µx|f seems to be fitted quite well by logistic regression. In general,
this may not be the case. The remaining cases give realistic examples where the fit
may not be as good.
48
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
50 100 200 400 600 800 1000
Ef(µ
x|f -
f)2 /σ
x2
Sample size
REL/σx2 versus sample size
DL KC DL KC DL KC DL KC DL KC DL KC DL KC
5.58 5.56
0
0.5
1
1.5
2
50 100 200 400 600 800 1000
Ef(µ
x|f -
µx)
2 /σx2
Sample size
RES/σx2 versus sample size
DL KC DL KC DL KC DL KC DL KC DL KC DL KC
5.36 5.33
Figure 4.8: CR decompositions estimated by 4 approaches for nonexceedance prob-ability p = 0.05; “D” is discretized (11-binned) approach (DSC), “L” is logistic re-gression (LRM), “K” is kernel density estimation directly applied to r(f |x) (KDM),and “C” is combination of logistic regression and kernel density estimation (CM).The maximum, upper quartile, median, lower quartile, and minimum are indicatedfrom top to bottom. The forecasts are produced by the analytical model.
49
Table 4.6: Root Mean Squared Error (RMSE) in REL/σ2x for the forecasts generated
for nonexceedance probability p = 0.05 by the analytical model.
DSC LRM KDM CM
50 2.613e-001 1.664e-001 2.435e-001 1.479e+000
100 1.701e-001 9.102e-002 1.683e-001 4.702e-001
200 1.089e-001 5.150e-002 1.142e-001 6.604e-002
400 6.566e-002 3.351e-002 7.775e-002 4.353e-002
600 4.812e-002 2.723e-002 6.199e-002 3.567e-002
800 3.916e-002 2.500e-002 5.451e-002 3.253e-002
1000 3.269e-002 2.299e-002 4.784e-002 2.968e-002
Note: the underlined value is the smallest in the row.
Table 4.7: Root Mean Squared Error (RMSE) in RES/σ2x for the forecasts generated
for nonexceedance probability p = 0.05 by the analytical model.
DSC LRM KDM CM
50 4.124e-001 3.523e-001 4.036e-001 1.439e+000
100 2.704e-001 2.256e-001 2.719e-001 4.926e-001
200 1.754e-001 1.580e-001 1.883e-001 1.687e-001
400 1.133e-001 1.079e-001 1.280e-001 1.141e-001
600 8.875e-002 8.806e-002 1.033e-001 9.276e-002
800 7.841e-002 8.128e-002 9.327e-002 8.551e-002
1000 6.723e-002 6.949e-002 7.931e-002 7.292e-002
Note: the underlined value is the smallest in the row.
50
4.3 Monte Carlo Simulation withStochastic Model of Streamflow ForecastingSystem
A stochastic model of monthly volume forecasts for the experimental Des
Moines River system will be used for the Monte Carlo simulations. This model
produces dichotomous observations and corresponding continuous probabilistic fore-
casts. The forecast quality aspects calculated from one million pairs of forecasts and
observations are considered as the true ones for the forecasting system. In devel-
oping a stochastic model for the forecasting system, the historical simulation of the
monthly volume is assumed to be the observed volume. This assumption eliminates
the impacts of hydrological model biases and errors from the ensemble predictions.
The three continuous approaches (LRM, KDR, and CM) and 11-binned discrete
approach (DSC) are compared and discussed.
4.3.1 Assumptions and Procedure
The September monthly volume forecasts with a 1-month lead time are chosen
as forecasts to be modeled. An analysis was made of the September ESP (Extended
Streamflow Prediction) forecasts from the experimental system for the Des Moines
River. Using the Chi-Squared Goodness-of-Fit Test and the L-moment ratio dia-
gram, the following assumptions are made:
1. The observed monthly volume U for September has a lognormal distribution.
2. The ensembles of September monthly volume Y given U have Generalized
Pareto (GPA) distributions.
For ensembles with a GPA distribution, the parameters of the distribution can be
related to the first three L-moments (x`1 , x`2 , and x`3). Hence, an ESP forecast
and its corresponding observation can be represented by the four random variables
{U , (X`1 , X`2 , and X`3)}. Figure 4.9 shows scatter plots of these four variables for
the September forecasts for the Des Moines River for 1948-1997. Note that there
are fairly strong associations between observations and the first L-moment, the first
and the second L-moments, and the second and the third L-moments. A stochastic
model of the relationship between these variables is used in the following Monte
Carlo experiments.
51
0
50000
100000
150000
200000
250000
300000
350000
0 50000 100000 150000 200000 250000 300000
x l1
u
L-moment 1 of Ensemble Volume versus Observation
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
55000
0 50000 100000 150000 200000 250000 300000 350000
x l2
xl1
L-moment 2 versus L-moment 1 of Ensemble Volume
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 10000 20000 30000 40000 50000
x l3
xl2
L-moment 3 versus L-moment 2 of Ensemble Volume
Figure 4.9: Relations between observations and L-moments for September monthlyvolume.
52
Table 4.8: Parameters used in fitting the distribution to observed monthly volume(U) and the first three L-moments of the ensemble volumes (X`1 , X`2 , and X`3).
r. v. distribution location (ξ) scale (α) shape (k)
X`1 GEV 32887.133 24156.669 -0.38524433
X`2 GEV 9068.2572 5657.7145 -0.18209885
X`3 GEV 4844.7753 2633.4439 -0.0019766099
r. v. distribution mean (µ) s.d. (σ)
U LN 10.582935 0.9640626
First, the four variables are transformed into standard normal variates using
the following transformation:
zu = Φ−1(FU(u)) (4.4)
z`1 = Φ−1(F1(x`1)) (4.5)
z`2 = Φ−1(F2(x`2)) (4.6)
z`3 = Φ−1(F3(x`3)) (4.7)
where Φ−1 is the inverse function of standard normal cumulative distribution func-
tion (cdf), and Fi represents the cdf of individual variables. As noted above, FU
is assumed to be a lognormal distribution. Based on empirical analysis of the L-
moments for the 49-year forecast period, each of the L-moments is assumed to
have a generalized extreme-value (GEV) distribution. The estimated parameters
for these distributions is shown in Table 4.8. Figure 4.10 shows the transformed
observations and L-moments. Each scatterplot indicates an strong linear relation.
Hence, the relationships between the variables are assumed to have a bivariate nor-
mal distribution. Table 4.9 indicates the parameters necessary to model the system
of observations and forecasts by bivariate normal distributions.
The following steps are carried out to generate forecast-observation pairs for
this stochastic model:
1. Generate a lognormal variate, and then transform it into the standard normal
variate zu.
2. Generate a normal variate, z`1 , whose distribution has the mean, µ`1 , and
53
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
z l1
zo
The 1st L-moment of ensemble volumes versus observed volumes
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
z l2
zl1
The 1st versus the 2nd L-moments of ensemble volumes
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
z l3
zl2
The 2nd versus the 3rd L-moments of ensemble volumes
Figure 4.10: Scatterplot of transformed observed monthly volume and transformedL-moments of monthly volume ensembles.
Table 4.9: Summary statistics of the standardized random variables.
mean s.d. rho
zu 9.38e-08 1.00 ρu`1 0.863
z`1 0.0188 0.982 ρ`1`2 0.986
z`2 0.0139 0.978 ρ`2`3 0.998
z`3 0.00924 0.970
54
variance, σ2`1
, given by
µ`1 = ρu`1 · zu (4.8)
σ2`1
= 1.0− ρ2u`1
, (4.9)
which come from the conditional pdf of the bivariate normal distribution.
3. Untransform z`1 to the ensemble first L-moment x`1 through the GEV:
F1 = Φ(z`1) (4.10)
x`1(F1) =
ξ`1 + α`1{1− (− log F1)k`1}/k`1 , k`1 6= 0
ξ`1 − α`1 log(− log F1), k`1 = 0(4.11)
4. Generate a normal variate, z`2 , in the same manner as 2; replace ρu`1 and zu
with ρ`1`2 and z`1 in Equations (4.8) and (4.9) to get µ`2 and σ`2 .
5. Untransform z`2 to the ensemble second L-moment x`2 through the GEV.
6. One more time, generate a normal variate, z`3 , in the same manner as 2 and 4;
substitute ρ`2`3 and z`2 into ρu`1 and zu in Equations (4.8) and (4.9) to obtain
µ`3 and σ`3 .
7. Untransform z`3 to the ensemble third L-moment x`3 through the GEV.
8. Let yp be a critical threshold, for instance, low flow during the summer. Then
the forecast of the non-exceedance probability for yp is calculated by
f = FY |U(yp)
= 1− e−v (4.12)
where
v =
−k−1f ln{1− kf (yp − ξf )/αf}, kf 6= 0
(yp − ξf )/αf . kf = 0(4.13)
kf = (1− 3x`3/x`2)/(1 + x`3/x`2), (4.14)
αf = (1 + kf )(2 + kf )x`2 , (4.15)
ξf = x`1 − (2 + kf )x`2 . (4.16)
The true CR decompositions are assumed to be obtained by DSC applied
to one million pairs of forecasts and observations. The verification data set is
produced for two yp thresholds, 0.25 and 0.05 quantiles of observations (p = 0.25
and p = 0.05), so that the results can be compared with those of the analytical model
55
described in the previous section. Again, the probabilistic forecasts are issued in
continuous numbers.
4.3.2 Result and Discussion
The estimations of µx|f for both the cases, p = 0.25 and p = 0.05, with the
sample sizes 50 and 1000, are shown in Figures 4.11 and 4.12. The logistic regression
model cannot have the two peaks at f = 0.1 and 0.9 as the true relationship shows.
KDM also failed to represent them for a sample size of 50. However, for the case of
p = 0.25 with the sample size 50, these methods follow the true line with smaller
uncertainty than DSC does. With a sample size of 1000, the logistic regression and
kernel density estimation still show smaller uncertainties than DSC. For the extreme
case p = 0.05 with 50 samples the mean estimates by DSC are much lower than the
true line, while the mean ones by LRM and KDM are closer. In the case with 1000
samples, the logistic regression is a highly biased estimator with low variability .
Figure 4.13 shows the box plots of REL/σ2x and RES/σ2
x for the moderate flow
event p = 0.25. As seen in the case of the analytical model, LRM and CM produces
medians closer to the true REL/σ2x than DSC does for sample sizes of 1000 or less.
Those of KDM are also closer to the true value than DSC for sample sizes less than
600. Surprisingly, the three continuous approach show less dispersion (IQR) than
DSC does. As for Resolution, each continuous approach produces medians closer
to the true RES/σ2x than DSC does for sample sizes of 800 or less. On the other
hand, the IQRs are very similar to each other. Figure 4.14 shows the box plots
of REL/σ2x and RES/σ2
x for the extreme flow event p = 0.05. The median of the
Reliability estimator by LRM is closer to the true value than the one of DSC for
sample sizes of 1000 or less. CM performs the worst for sample sizes 50 through 200
for Reliability. The estimators for Reliability by KDM and CM are negative for the
small sample sizes. LRM has the smallest uncertainty (IQR) over all the sample
sizes, while KDM and CM have large uncertainty for the sample sizes 50 and 100.
For Resolution, the medians of LRM estimators are closer to the true one for all
the sample sizes studied.
Finally, the RMSE for REL/σ2x and RES/σ2
x is discussed. For the moderate
threshold, p = 0.25 (Table 4.10), KDM is the best estimator of Reliability for small
sample sizes of 50 and 100. After sample size 200, LRM becomes the best one.
Table 4.11 shows that CM achieves the lowest error in Resolution for sample sizes
of 1000 or less. Actually, all the continuous approaches are better for Resolution
56
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by DSC
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 1000 by DSC
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by LRM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 1000 by LRM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by KDM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 1000 by KDM
errorbarstrue
Figure 4.11: Conditional mean of the observations given the forecasts µx|f estimatedby three methods, DSC, LRM, and KDM, for nonexceedance probability p = 0.25with sample sizes 50 and 1000. The forecasts are produced by the stochastic model.
57
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by DSC
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 1000 by DSC
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by LRM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 1000 by LRM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by KDM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 1000 by KDM
errorbarstrue
Figure 4.12: Conditional mean of the observations given the forecasts µx|f estimatedby three methods, DSC, LRM, and KDM, for nonexceedance probability p = 0.05with sample sizes 50 and 1000. The forecasts are produced by the stochastic model.
58
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
50 100 200 400 600 800 1000
Ef(µ
x|f -
f)2 /σ
x2
Sample size
REL/σx2 versus sample size
DL KC DL KC DL KC DL KC DL KC DL KC DL KC
0
0.2
0.4
0.6
0.8
1
1.2
50 100 200 400 600 800 1000
Ef(µ
x|f -
µx)
2 /σx2
Sample size
RES/σx2 versus sample size
DL KC DL KC DL KC DL KC DL KC DL KC DL KC
Figure 4.13: CR decompositions estimated by four approaches for nonexceedanceprobability p = 0.25; “D” is discretized (11-binned) approach (DSC), “L” is logis-tic regression (LRM), “K” is kernel density estimation directly applied to r(f |x)(KDM), and “C” is combination of logistic regression and kernel density estimation(CM). The maximum, upper quartile, median, lower quartile, and minimum areindicated from top to bottom. The forecasts are produced by the stochastic model.
59
-2
-1
0
1
2
3
4
5
6
50 100 200 400 600 800 1000
Ef(µ
x|f -
f)2 /σ
x2
Sample size
REL/σx2 versus sample size
DL KC DL KC DL KC DL KC DL KC DL KC DL KC
-1
0
1
2
3
4
5
6
50 100 200 400 600 800 1000
Ef(µ
x|f -
µx)
2 /σx2
Sample size
RES/σx2 versus sample size
DL KC DL KC DL KC DL KC DL KC DL KC DL KC
Figure 4.14: CR decompositions estimated by four approaches for nonexceedanceprobability p = 0.05; “D” is discretized (11-binned) approach (DSC), “L” is logis-tic regression (LRM), “K” is kernel density estimation directly applied to r(f |x)(KDM), and “C” is combination of logistic regression and kernel density estimation(CM). The maximum, upper quartile, median, lower quartile, and minimum areindicated from top to bottom. The forecasts are produced by the stochastic model.
60
with small samples than the discrete approach. LRM is still the best estimator for
p = 0.05 for sample sizes of 1000 or less, although the logistic regression produces
the biased distribution of conditional mean µx|f as discussed before. The lowest
error seems to be achieved by the very low variability.
In conclusion, all the continuous approaches achieved less error in Reliabil-
ity (REL) and Resolution (RES) than the discrete approach for the moderate and
extreme flow events. By imposing some structure on the distribution of µx|f the con-
tinuous approaches reduced the variability or/and the bias. Note that kernel density
estimations have more flexibility in the estimation of µx|f than logistic regressions,
whereas logistic regressions indicate lower variability. In case of the moderate flow
event p = 0.25, each continuous approach produces closer median of REL (RES)
estimator to the true value with the sample sizes 50 to 600 (1000) than the discrete
approach. Moreover, KDM is the best estimator of REL for small sample sizes of
50 and 100, while CM had the least error in RES for all the sample sizes studied.
In the case of the extreme flow event p = 0.05, LRM is the best estimator of REL
and RES for small sample sizes.
4.4 Monte Carlo Simulation withDiscrete Joint Distribution Model
The verification datasets generated by analytical and stochastic model consist
of the forecasts originally issued in the continuous numbers between 0 and 1. In that
case, it turned out that the continuous approaches work better than the discrete
approach for moderate and extreme thresholds with the small sample size. What
if the forecasts are originally issued in the discrete manner? In this section, the
forecasts are generated in 12 discrete numbers, and then three continuous methods,
LRM, KDM, and CM are applied to the verification datasets. Here, the discrete
approach referred to as DSC uses 12 discrete forecast values from which the discrete
forecasts are generated.
4.4.1 Assumptions and Procedure
This example of a discrete forecast system is taken from Wilks (1995, p. 246),
Subjective 12-24-h Projection Probability-of-Precipitation Forecasts for United States
during October 1980-March 1981. Since the conditional distribution q(x|f) and
marginal distribution s(f) are given, the joint distribution is reconstructed from
61
Table 4.10: Root Mean Squared Error (RMSE) in REL/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.25 by the stochastic model.
DSC LRM KDM CM
50 1.944e-001 6.448e-002 5.833e-002 7.743e-002
100 1.133e-001 3.673e-002 3.536e-002 4.001e-002
200 5.953e-002 2.061e-002 2.587e-002 2.298e-002
400 3.256e-002 1.279e-002 2.049e-002 1.589e-002
600 2.310e-002 1.067e-002 1.793e-002 1.359e-002
800 1.758e-002 8.469e-003 1.661e-002 1.176e-002
1000 1.482e-002 7.713e-003 1.544e-002 1.082e-002
Note: the underlined value is the smallest in the row.
Table 4.11: Root Mean Squared Error (RMSE) in RES/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.25 by the stochastic model.
DSC LRM KDM CM
50 2.272e-001 1.678e-001 1.648e-001 1.565e-001
100 1.456e-001 1.182e-001 1.190e-001 1.093e-001
200 9.322e-002 8.251e-002 8.333e-002 7.752e-002
400 6.368e-002 5.906e-002 5.973e-002 5.564e-002
600 5.327e-002 5.109e-002 5.176e-002 4.892e-002
800 4.381e-002 4.227e-002 4.271e-002 4.043e-002
1000 3.951e-002 3.870e-002 3.952e-002 3.739e-002
Note: the underlined value is the smallest in the row.
62
Table 4.12: Root Mean Squared Error (RMSE) in REL/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.05 by the stochastic model.
DSC LRM KDM CM
50 9.255e-001 7.682e-001 1.441e+001 3.290e+002
100 7.171e-001 5.617e-001 7.241e-001 6.152e+001
200 5.253e-001 3.972e-001 5.172e-001 3.259e+000
400 3.621e-001 2.720e-001 3.726e-001 5.435e-001
600 2.918e-001 2.283e-001 3.097e-001 2.304e-001
800 2.470e-001 1.982e-001 2.755e-001 2.013e-001
1000 2.067e-001 1.656e-001 2.465e-001 1.689e-001
Note: the underlined value is the smallest in the row.
Table 4.13: Root Mean Squared Error (RMSE) in RES/σ2x for the forecasts gener-
ated for nonexceedance probability p = 0.05 by the stochastic model.
DSC LRM KDM CM
50 9.238e-001 7.230e-001 1.447e+001 3.289e+002
100 6.576e-001 4.247e-001 6.957e-001 6.135e+001
200 4.560e-001 2.403e-001 4.538e-001 2.733e+000
400 3.112e-001 1.599e-001 3.222e-001 4.790e-001
600 2.340e-001 1.323e-001 2.692e-001 1.295e-001
800 2.131e-001 1.228e-001 2.604e-001 1.195e-001
1000 1.657e-001 1.028e-001 2.230e-001 1.025e-001
Note: the underlined value is the smallest in the row.
63
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
PD
F
Probablistic Forecast
r(f|x=0) of the Discrete Forecast
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
PD
F
Probablistic Forecast
r(f|x=1) of the Discrete Forecast
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
PD
F
Probablistic Forecast
s(f) of the Discrete Forecast
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f of the Discrete Forecast
Figure 4.15: True marginal and conditional distributions of the discrete forecasts.
CR factorizations. The forecast takes on the following twelve values: 0.0, 0.05,
0.1, 0.2, · · ·, 0.8, 0.9, 1.0. The verification set is generated easily, using the CR
factorizations:
1. Generate a forecast through the cumulative distribution function of the fore-
cast f by generating a uniform random number.
2. Generate Bernoulli variate x by the probability π that an event x = 1 occurs
given the forecast f ; the π is obtained from the conditional distribution of the
observations given the forecasts, π = q(x = 1|f).
Note that the unconditional probability of precipitation is t(x = 1) = 0.162,
which is between the moderate p = 0.25 and extreme p = 0.05 cases considered in
the previous sections (Table 4.14).
64
Table 4.14: Basic information and true forecast quality measures of Subjective12-24-h Projection Probability-of-Precipitation Forecasts for United States duringOctober 1980-March 1981 from Wilks (1995).
Sample size 12,402 MSE/σ2x 5.39E-01
t(x=1) 0.162 ME/σx 3.96E-02
σ2x 0.135718
TY2/σ2x 2.86E-01 Dis/σ2
x 2.17E-01
REL/σ2x 6.03E-03 RES/σ2
x 4.67E-01
4.4.2 Result and Discussion
First, the distribution of µx|f is discussed (Figure 4.16). Even though DSC
does not have any loss of information caused by binning, its estimator with the 50
samples is biased with larger uncertainty than the continuous methods; the estima-
tor of DSC becomes almost unbiased at the sample size 200. The logistic regression
by LRM gives a smooth “S” curve; it is biased in the upward and downward direc-
tions even with the 1,000 samples. On the other hand, KDM has smaller bias as
a whole with the 1,000 samples. This shows the high flexibility of kernel density
estimation method.
Figure 4.17 shows the box plot for REL/σ2x and RES/σ2
x. In this verification
data set, KDM’s medians for REL/σ2x and RES/σ2
x are closer to the true values than
any other method. The median of LRM is closer to the true REL/σ2x (RES/σ2
x) than
DSC until sample sizes of 400 (200), whereas CM also has median closer to the true
REL/σ2x (RES/σ2
x) than DSC for sample sizes of 200 (100) or less. According to
IQR for Reliability, the estimators by LRM and KDM have less uncertainty. In case
of Resolution, IQRs are almost the same.
Tables 4.15 and 4.16 show the RMSE in REL/σ2x and RES/σ2
x. KDM’s estima-
tor for REL/σ2x for a sample size 50 is about one third of DSC’s. Again, KDM could
yield some negative estimates for Reliability, while LRM comes closer to the true
one from the positive direction only (Figure 4.17). KDM gives the best estimator
for Resolution, which is positive.
The main findings of this analysis are the following. Even though DSC does
not have any loss of information caused by binning, its estimator for µx|f is biased
and has larger uncertainty than the continuous methods. On the other hand, KDM
65
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by DSC
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 1000 by DSC
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by LRM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 1000 by LRM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 50 by KDM
errorbarstrue
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
µ x|f
Probablistic Forecast
µx|f with sample size 1000 by KDM
errorbarstrue
Figure 4.16: Conditional mean of the observations given the forecasts µx|f estimatedby three methods, DSC, LRM, and KDM, with sample sizes 50 and 1000. Theforecasts are produced by the discrete model.
66
Table 4.15: Root Mean Squared Error (RMSE) in REL/σ2x for the forecasts gener-
ated by the discrete model.
DSC LRM KDM CM
50 2.134e-001 8.891e-002 7.053e-002 1.572e-001
100 1.234e-001 5.120e-002 3.417e-002 8.092e-002
200 6.423e-002 3.345e-002 1.655e-002 5.341e-002
400 3.320e-002 2.640e-002 1.085e-002 4.068e-002
600 2.257e-002 2.369e-002 9.491e-003 3.519e-002
800 1.720e-002 2.226e-002 8.552e-003 3.248e-002
1000 1.378e-002 2.141e-002 7.739e-003 3.062e-002
Note: the underlined value is the smallest in the row.
Table 4.16: Root Mean Squared Error (RMSE) in RES/σ2x for the forecasts gener-
ated by the discrete model.
DSC LRM KDM CM
50 2.769e-001 2.221e-001 2.032e-001 2.657e-001
100 1.824e-001 1.555e-001 1.448e-001 1.749e-001
200 1.228e-001 1.108e-001 1.041e-001 1.219e-001
400 8.320e-002 8.134e-002 7.512e-002 8.917e-002
600 6.875e-002 6.927e-002 6.479e-002 7.513e-002
800 5.958e-002 6.134e-002 5.596e-002 6.684e-002
1000 5.154e-002 5.417e-002 4.908e-002 5.918e-002
Note: the underlined value is the smallest in the row.
67
-0.2
0
0.2
0.4
0.6
0.8
50 100 200 400 600 800 1000
Ef(µ
x|f -
f)2 /σ
x2
Sample size
REL/σx2 versus sample size
DL KC DL KC DL KC DL KC DL KC DL KC DL KC
2.0
0
0.5
1
1.5
2
50 100 200 400 600 800 1000
Ef(µ
x|f -
µx)
2 /σx2
Sample size
RES/σx2 versus sample size
DL KC DL KC DL KC DL KC DL KC DL KC DL KC
Figure 4.17: CR decompositions estimated by four approaches for Discrete Forecast;“D” is discretized (12-binned) approach (DSC), “L” is logistic regression (LRM),“K” is kernel density estimation directly applied to r(f |x) (KDM), and “C” is com-bination of logistic regression and kernel density estimation (CM). The maximum,upper quartile, median, lower quartile, and minimum are indicated from top tobottom. The forecasts are produced by the discrete model.
68
shows more flexibility than LRM in estimation of µx|f . In estimation of REL/σ2x and
RES/σ2x, the median by KDM was the closest to the true value. KDM’s estimator
for Reliability with a sample size of 50 is about three times more efficient than DSC.
Even in the case where the forecasts are issued originally in discrete numbers, all
the continuous approaches gave better estimators for REL/σ2x and RES/σ2
x for small
sample sizes such as 50 or 100 than the discrete approach with contingency table.
4.5 Summary and Conclusions
This chapter investigated the three continuous approaches (LRM, KDM, and
CM) to reduce the estimation error in DO measures. Verification datasets were
generated by three forecasting systems: an analytical model for the joint distribu-
tion, a stochastic model of an ESP streamflow forecasting system, and a discrete
joint distribution model. For these verification datasets, the distributions of the DO
measures, and marginal and conditional distributions, are calculated by the three
continuous approaches and discrete approach.
For small samples, the continuous approach with LRM is the best estimator for
CR decompositions. It is better than traditional contingency table approach, DSC,
whether the forecasts are issued in discrete or continuous numbers. LRM is the best
estimator for the forecasts issued for extreme events. KDM is also a better estimator
than DSC for small samples, and works better than LRM for the forecasts issued
for moderate events. But, the estimator of LRM for Reliability seems to be always
positive, whereas the one of KDM could be negative. In this point, LRM is more
desirable. Examination of box plots and decompositions of RMSE indicated that the
continuous approach achieves better estimation by reducing both the unconditional
bias and variance in Reliability and Resolution estimates.
As seen in the case of analytical model, the logistic regression models the
conditional mean µx|f very well. However, the LRM approach has difficulty where
the true µx|f has some peaks, as illustrated by the stochastic model of the ESP
streamflow forecast. The kernel density estimation has high flexibility, although this
research utilized a simple method to cope with the boundary effect. The arithmetic
average, Equation (3.36), may have more a desirable feature which reflects the
marginal distribution of forecasts s(f) rather than using indirect estimation of s(f)
by kernel density estimation method in Equation (A.10) in case of extreme events.
The case of arithmetic average of µ2x|f estimated by KDM was also examined (not
shown here), which produced smaller RMSE for the extreme flow event p = 0.05
69
than original KDM. In both the verification data sets generated for the extreme
threshold p = 0.05, only LRM surpasses DSC in terms of RMSE. KDM fails because
the samples to estimate r(f |x = 1) are so limited, while the logistic regression
can utilize all the samples of forecasts. For example, KDR fails to determine the
smoothing parameter h for r(f |x = 1) if the SD of the forecasts given x = 1 is 0.
This often happens in the cases of the small sample sizes, 50, and 100.
Finally, even if forecasts are originally issued in discrete numbers, the contin-
uous approaches may yield better estimators than the discrete approach, especially
for small sample sizes, say, less than 100. For the forecasts given in continuous num-
bers, the discrete approach produces bias by changing the continuous forecasts into
the discrete numbers. Since the continuous approaches impose some structure on
the distribution of µx|f without changing the original forecasts, it is easy to imagine
that the continuous approaches perform better than the discrete approach for small
sample sizes. However, for the forecasts originally issued in discrete numbers, the
continuous approaches succeeded in giving the better estimation of the measures
than the discrete approach. Moreover, the discrete approach with contingency ta-
ble requires the selection of bin widths in order to obtain reasonable sample sizes
in the bins, whereas few parameters have to be estimated to use the continuous ap-
proaches. Thus, in terms of implementation, the continuous approaches are superior
to the discrete approach in case of small sample size of verification dataset.
70
CHAPTER 5
ASSESSMENT OF BIAS CORRECTION METHODSFOR ENSEMBLE FORECASTS
This research extends the Distributions-Oriented (DO) approach to the veri-
fication of probability distribution forecasts (or ensemble forecasts) of streamflow.
Using the verification datasets derived for discrete events, forecast quality of the
probability distribution forecast can be assessed over the range of possible out-
comes. This chapter demonstrates the usefulness of the DO approach. Three types
of bias correction methods are applied to ensemble forecasts that an experimen-
tal forecasting system for the Upper Des Moines River basin produces. The DO
approach is utilized to assess the probabilistic forecasts modified by the bias cor-
rection methods, and the resultant DO measures and distributions of the forecasts
are discussed.
5.1 Introduction
All dynamic models contain some bias. A hydrological model for streamflow
simulation is no exception; it has conditional biases due to input data or the model
assumptions. This fact naturally provokes the questions: what are the effects of
biases on the potential use of a hydrological model, and how can these biases be
removed? In recent years, Ensemble Prediction Systems (EPSs), which produce
forecasts based on many realizations from an initial condition, are gaining popularity
in hydrological and meteorological forecasting. The set of realizations is called the
ensemble. The conditional biases from a hydrological model may propagate to the
ensemble, and then to the probabilistic forecasts obtained by frequency analysis
of the ensemble. However, since the true distribution of ensemble (e.g., produced
by a hydrological model without biases), cannot be obtained, direct comparison of
modified ensembles and true ensembles cannot be done to assess the bias correction
methods. One indirect approach is to examine the probabilistic forecasts produced
with the bias-corrected ensemble. Hence, the distributions-oriented (DO) approach
can be a powerful tool to evaluate bias correction methods.
To illustrate, we examine the probabilistic forecasts for monthly streamflow
volume observed at Stratford on the Des Moines River. The available historical
71
0
20000
40000
60000
80000
100000
120000
140000
99989489806960504030201051
Vol
ume
(cfs
-day
s)
Nonexceedance Probability (%)
Des Moines River near Stratford, Iowa
Original EnsembleCorrected Ensemble
Figure 5.1: Example of Bias Correction Method applied to ensemble traces.
record consists of the observations of N = 49 years. On each forecast date, the
Hydrological Simulation Program-Fortran (HSPF) produces different realizations
(or ensemble) of streamflow with an initial hydroclimatological condition of the
basin at current time t, and i-year meteorological information (i = 1, · · · , N , except
the year including t). Thus, N − 1 = 48 traces of streamflow that start from the
current time, or forecasting time, and continue for one year, are obtained. By
seperating each trace by a month, 12 monthly streamflow volumes with leadtime
0 through 11 month are obtained for each trace. Then, frequency analysis on the
ensemble volumes obtained for one monthly volume with a leadtime produces a
probabilistic distribution forecast (see Chapter 2).
Three types of bias correction methods are applied to the ensembles of monthly
streamflow volumes. Changing the ensemble volumes lead to different probability
distribution forecasts (Figure 5.1). As another way of implementing bias correction,
post-hoc recalibration, which minimizes the conditional bias (Reliability) by linear
transformation, has been suggested by Wilks (2000). Hou et al. (1998) also suggest
a post-hoc recalibration to achieve calibration of probabilistic forecasts, using rank
distributions in conjunction with the ensemble.
The probabilistic forecast for an event that monthly volume is less than or
equal to a threshold is obtained from the probability distribution forecast. The
corresponding continuous observation is converted into a discrete number: 1 indi-
cates that the event occurred, or 0 means that the event did not occur. Hence, the
verification dataset for the event consists of the pairs of probabilistic forecasts and
72
discrete observations. Using the verification datasets derived for discrete events,
forecast quality of the probability distribution forecast can be assessed over the
range of possible outcomes. The thresholds for which forecasts are issued are nine
quantiles of observations, or nonexceedance probabilities of observations (p =0.05,
0.10, 0.25, 0.33, 0.50, 0.66, 0.75, 0.90 and 0.95). The verification datasets produced
by the bias correction methods are assessed by DO measures described in Section
3.2. In calculating the DO measures, the Logistic Regression Method (LRM) is
implemented (Subsection 3.3.3). Various leadtimes are examined to investigate the
relation between bias by the hydrological model and measures of forecast quality.
5.2 Biases in Historical Simulations
Let Yi be the volume observed in a month and year i, and Yi be the corre-
sponding historical simulation of monthly volume (i = 1, · · · , 49). The main idea to
find a “correction” for ensemble volumes is to use the relationship between Yi and
Yi. To shed light on the characteristics of the monthly streamflow volume, the mean,
standard deviation (SD) and coefficient of variation (CV) of the observations, and
the Mean Error (ME), Root Mean Square Error (RMSE), and correlation coefficient
(CC) between the observations and historical simulations are calculated. The mean
of the observed monthly volume in Table 5.1 reveals that the wet season of this basin
is from March to July, and the dry season is from August to February. According
to Table 5.2, January, February, August, September, October, and November have
the positive values in Mean Error, which indicates that the monthly volumes tend
to be overestimated by the hydrological model. In contrast, the other months un-
derestimate the monthly volume. Since this basin has high flow events from March
to July and low flow events in other months, this model tends to underestimate
high flow events and overestimate low flow events. This can be also seen from the
time series of the observed monthly volume and historical simulation (Figure 5.2).
The MSE (Mean Square Error) Skill Score (SSMSE), which is a relative mea-
sure of accuracy, is also calculated for each month. It compares the MSE (absolute
accuracy measure) for the historical simulations with one for climatology (or mean
of observation) as SSMSE = 1 − (RMSE/SD)2. The SSMSE indicates that the
January histrical simulation has the lowest accuracy of twelve months.
73
Table 5.1: Mean, Standard Deviation (SD), and Coefficient of Variation (CV) ofthe observed monthly volume (cfsd) for the Des Moines River at Stratford.
Month Mean SD CV Month Mean SD CV
Jan 18022 21501 1.19 Jul 109626 137366 1.25
Feb 27432 40896 1.49 Aug 50769 74678 1.47
Mar 116214 101187 0.871 Sep 40627 53710 1.32
Apr 176127 168633 0.957 Oct 43520 55086 1.27
May 134802 111722 0.829 Nov 38202 44075 1.15
Jun 148183 137626 0.929 Dec 29157 35102 1.20
Table 5.2: Mean Error (ME), Root Mean Square Error (RMSE), correlation coefficient(CC), and Mean Square Error (MSE) Skill Score (SSMSE) between the observed monthlyvolume and historical simulations.
Month MEa RMSEa CC SSMSE Month MEa RMSEa CC SSMSE
Jan 3610 15122 0.831 0.505 Jul -6588 35658 0.970 0.933
Feb 11817 22669 0.921 0.693 Aug 15666 26622 0.957 0.873
Mar -8301 50431 0.870 0.752 Sep 17509 25112 0.950 0.781
Apr -22705 50842 0.967 0.909 Oct 7348 25799 0.893 0.781
May -18143 42335 0.946 0.856 Nov 823 15810 0.935 0.871
Jun -29524 53399 0.952 0.849 Dec -2099 18583 0.866 0.720
a The unit is cfsd.
5.3 Bias Correction Methods
Let Yi be the ensemble monthly volume, which is conditional on the initial
hydrological conditions with i-year meteorological conditions. It is naturally con-
ceived that the bias in the ensemble volume depends on the month for which the
ensemble volume is issued, and its magnitude. Thus, the bias-corrected ensemble
volumes are obtained through some function fj() for a certain month j:
Zi = fj(Yi). (5.1)
74
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1-8
84
-88
7-8
81
0-8
81
-89
4-8
97
-89
10
-89
1-9
04
-90
7-9
01
0-9
01
-91
4-9
17
-91
10
-91
1-9
24
-92
7-9
21
0-9
21
-93
4-9
37
-93
10
-93
1-9
44
-94
7-9
41
0-9
41
-95
4-9
57
-95
10
-95
1-9
64
-96
7-9
61
0-9
61
-97
4-9
77
-97
10
-97
Observed streamflow
Historical simulation
Figure 5.2: Comparison of observed monthly volume and historical simulation fromJanuary 1988 to December 1997.
75
Then, the function fj() is estimated from the set of the volumes observed in the
month j, and the corresponding historical simulations, that is, Yi and Yi (i =
1, · · · , 49). This research investigates a multiplicative correction, regression, and
quantile-mapping as the function.
5.3.1 Event-Bias Correction Method
The first method is the Event-Bias Correction method (EBC). This method
assumes the same bias exists for the same historical meteorological input. Smith et
al. (1992) defined a multiplicative corrector as:
Zi = [YiY−1i ]Yi (5.2)
This method modifies the historical simulation Yi perfectly, i.e., when Yi = Yi. The
unique feature of this method is that it expects the same multiplicative bias for the
historical meteorological event, regardless of the magnitude of the ensemble volume.
The left of Figure 5.3 shows an example of EBC.
5.3.2 Regression-Type Method
The second method uses the expected value of observations given the simu-
lated volume, based on a regression between observed volumes and corresponding
historical simulations. First, consider the simplest case where the bias-corrected
simulation is given by linear interpolation between the historical simulation and
observation (hereafter RLI). With the order statistics for the historical simulation
denoted by
Y(1) ≤ Y(2) ≤ · · · ≤ Y(n), (5.3)
and the observations corresponding to Y(i) denoted by Y ′(i), the corrected ensemble
simulation is defined by
Zi =Y ′
(2) − Y ′(1)
Y(2) − Y(1)
(Y(1) − Y(1)) + Y ′(1)
for Yi < Y(2)
Zi =Y ′
(j+1) − Y ′(j)
Y(j+1) − Y(j)
(Yi − Y(j)) + Y ′(j) (5.4)
for Y(j) ≤ Yi < Y(j+1)
Zi =Y ′
(n) − Y ′(n−1)
Y(n) − Y(n−1)
(Yi − Y(n−1)) + Y ′(n−1)
76
0
20000
40000
60000
80000
100000
120000
0 20000 40000 60000 80000 100000 120000 140000 160000
Obs
erve
d vo
lum
e (c
fsd)
Simulated volume (cfsd)
Event-Bias Correction Method (EBC)
Corrected ensemble volumesHistorical simulation
0
20000
40000
60000
80000
100000
120000
0 20000 40000 60000 80000 100000 120000 140000 160000
Obs
erve
d vo
lum
e (c
fsd)
Simulated volume (cfsd)
Linear Interpolation (RLI)
Corrected ensemble volumesHistorical simulation
Figure 5.3: Example of the bias-correction for 1-month lead time forecast withinitial condition of January in 1949; EBC (Event-Bias Correction Method) is leftand RLI (Linear Interpolation) right.
for Y(n−1) ≤ Yi
The right of Figure 5.3 shows the results by RLI for January in 1949. RLI gives the
true simulation to the historical simulation as EBC does and takes the magnitude
of simulation into account.
Secondly, consider a power function (RPF), one of the most common regression
functions. The bias-corrected ensemble simulation is given by
Zi = bY ci . (5.5)
The parameters b and c are optimized to minimize the sum of squared error for each
month using the historical simulation and observed volumes as
N∑
i=1
(Yi − bY ci )2 → min (5.6)
Figure 5.4 shows the power functions fitted to each set of observations and cor-
responding historical simulations for May and September monthly volume. The
broken line showing no bias reveals that each month contains conditional bias.
A third regression-type method, known as LOWESS (LOcally WEighted Scat-
terplot Smoothing) is investigated (called RLW). In essence, the expected value is
obtained by considering the vertical and horizontal distances between the sam-
ples within a moving window. The procedure of LOWESS is roughly stated here
(See Cleveland, 1979 for details). Consider a scatterplot of points (xi, yi) for
i = 1, · · · , n. For each xi, a weight function W is used to make weights ωk(xi)
for all xk (k = 1, · · · , n). In this procedure, centering W at xi and scaling it are
77
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
0 50000 100000 150000 200000 250000 300000 350000 400000
Obs
erve
d vo
lum
e (c
fsd)
Simulated volume (cfsd)
Power Function for May
0.910*(x**(1.02))Raw data
0
50000
100000
150000
200000
250000
300000
0 50000 100000 150000 200000 250000 300000
Obs
erve
d vo
lum
e (c
fsd)
Simulated volume (cfsd)
Power Function for September
0.101*(x**(1.17))Raw data
Figure 5.4: Observed monthly volume versus simulated monthly volume with powerfunction for May and September.
done in order that the point at which W first becomes zero is at the rth nearest
neighbour of xi. By introducing parameter f , 0 ≤ f ≤ 1, r is defined as the near-
est integer to fn. The initial smoothed value yi at each xi is obtained by a linear
regression to the data using weighted least squares with weights ωk(xi). Then, an-
other set of weights, δi, is constructed for each (xi, yi) based on the size of the
residual yi − yi using the weight function W . As large distances, xk − xi, lead to
small weight ωk(xi), large residuals result in small weight δi, and vice versa. New
smoothed values are computed by the linear regression using weighted least squares
with weights δiωk(xi).
Cleveland (1979) recommended using PRESS (PRediction Error Sum of Squares)
for estimating the smoothing parameter f . The PRESS statistic is a cross validation-
type estimator of error and the application of PRESS determines f so that the
regression produces the least error in making new predictions (Helsel and Hirsch,
1992). The weight function W used is the bisquare function:
B(x) =
(1− x2)2 if |x| < 1
0 otherwise(5.7)
In addition, in the case that any pair of adjacent estimated points makes the
slope negative, the smoothing parameter f is increased by 0.01 increments until a
positive slope is obtained. This results in a one-to-one function. Finally, the bias-
corrected conditional model estimator is obtained by using linear interpolation with
estimated points by LOWESS. Figure 5.5 shows the points estimated by LOWESS
78
0
100000
200000
300000
400000
500000
600000
0 50000 100000 150000 200000 250000 300000 350000 400000
Obs
erve
d vo
lum
e (c
fsd)
Simulated volume (cfsd)
LOWESS for May
LOWESS ( f = 0.44 )Raw data
0
50000
100000
150000
200000
250000
300000
0 50000 100000 150000 200000 250000 300000
Obs
erve
d vo
lum
e (c
fsd)
Simulated volume (cfsd)
LOWESS for September
LOWESS ( f = 0.44 )Raw data
Figure 5.5: Observed monthly volume versus simulated monthly volume withLOWESS regression for May and September.
and segments between the points by interpolation for May and September monthly
volumes.
5.3.3 Quantile-Mapping Method
The idea of the third method, quantile-mapping Method (QM), is that the
observed volumes and the historical simulations should have the same cumulative
relative frequency. As with the historical simulations in linear interpolation method
(Equation 5.3), take the order statistics for the observations:
Y(1) ≤ Y(2) ≤ · · · ≤ Y(n). (5.8)
With this set of order statistics, (Y(i), Y(i)), the corrected simulation is obtained by
interpolation; the equations are just the replacement of Y ′(i) with Y(i) in Equation
5.5. This method is based on the one-to-one transformation by the empirical cu-
mulative distribution functions between the historical simulations and observations.
Therefore, there is an assumption that the conditional model estimators obey the
empirical cumulative distribution function (CDF) of the historical simulations. An
example for QM is shown in Figure 5.6.
5.4 Result and Discussion
5.4.1 Performance Measures
First, the MSE (Mean Square Error) Skill Score (SSMSE) is considered for
each the month. The MSE for the probabilistic forecasts of monthly volume is
79
0
20000
40000
60000
80000
100000
120000
0 20000 40000 60000 80000 100000 120000 140000 160000
Obs
erva
tion
(cfs
d)
Simulation (cfsd)
Quantile Mapping method (QM)
Corrected ensemble volumesHistorical simulation
Figure 5.6: Example of the Quantile Mapping method (QM) for 1-month lead timeforecast with initial condition of January in 1949.
compared with the variance of the observations, which is the MSE for a climatology
forecast (the mean of the observations). As explained in Section 5.1, the probability
distribution forecasts issued by the forecasting system are assessed at nine quantiles.
Thus, nine MSEs are obtained from one probabilisty distribution forecast for each
forecasted month and lead time. The SSMSE was calculated with the MSE and
the variance of observations averaged over the nine quantiles. The left-hand side
of Figure 5.7 indicates the SSMSE versus forecasted month for 1 to 3-month lead
times. Examination of the SSMSE for the forecasts without bias correction (NBC)
indicates the monthly variation. It is interesting to note that the NBC for 2 and 3-
month lead times show the similar patterns, although the pattern for a 1-month lead
time is somewhat different. Another important point is that the relative accuracy
of these probabilistic forecasts indicates different monthly characteristic from one of
the historical simulations (see 5.2). Thus, it is incorrect to speculate that the month
with high relative accuracy in historical simulation has high relative accuracy for
ensemble forecasting. Comparison of the SSMSE for the bias correction methods
with 0 (no skill line) illustrates that all of the bias correction methods have skill for
all the months.
In order to compare the SSMSE for five bias correction methods with NBC,
the Skill Score for Bias Correction (SSBC) is introduced as
SSBC = 1− MSE
MSENBC(5.9)
80
where MSENBC is the MSE for the simulations without bias correction. The right-
hand side of Figure 5.7 shows the SSBC versus forecasted month for 1 through 3-
month lead times. These values are also obtained by averaging MSE and MSENBCin Equation 5.9 over the nine quantiles. Clearly, some months such as August or
September benefit from the bias correction methods, but others such as May or
July do not. It is interesting to note that RPF failed to improve NBC in the winter
season: November, December, January and February. Examination of the SSBCs
for a 1-month lead time depicted in the top of Figure 5.7 reveals that RLI gives
largest improvement in accuracy. The SSBC for 2-month lead times indicates QM
is the best except for March, July, and November.
From the results of SSBC, May and September are selected as examples of
small and large improvement to examine in detail. Figure 5.8 shows the SSBCfor May and September volume over the lead time. None of the bias correction
methods improves the forecast skill much in May, even for a 1-month lead time. For
both months, the best score is obtained by QM at lead times greater than 1 month.
EBC is slightly more accurate at a 1-month lead time. It is speculated that the
multiplicative bias between historical simulation and the corresponding observation
is well preserved for the ensemble simulation with the year’s meteorological input
during the short time.
Generally, the difference between historical simulation and the corresponding
observation stems from model deficiency and input-data deficiency. In reality, it
is impossible to achieve a perfect simulation. However, in order to approximate
the measures of forecast quality by a perfect simulation model, a forecasting case
where the observations are replaced with the corresponding historical simulations
is considered. In the usual forecasting process, the observations are discretized into
0 or 1 based on a threshold of monthly volume, or a quantile calculated with the
observations. But in this case, the historical simulations are discretized based on the
quantiles of the historical simulations. Then, the DO measures are calculated for
the pairs of the discretized historical simulations and probabilistic forecasts. This
forecast is called a pseudoperfect streamflow simulation (PSS). The result for PSS
should approximate the maximum improvement that any bias correction method is
able to achieve. However, the set of observations for the PSS is different from the
actual set of observations, so other approaches do have better results in some cases.
Still, the result for PSS is also depicted in the following figures as a reference.
From here, September volume is investigated first. Figure 5.9 shows the Bias
81
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
MS
E S
kill
Sco
re
Forecasted Month
1-month lead time averaged over the quantiles
NBCRLIRPFRLWEBCQM
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Ski
ll S
core
for B
ias
Cor
rect
ion
Forecasted Month
1-month lead time averaged over the quantiles
RLIRPFRLWEBCQM
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
MS
E S
kill
Sco
re
Forecasted Month
2-month lead time averaged over the quantiles
NBCRLIRPFRLWEBCQM
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Ski
ll S
core
for B
ias
Cor
rect
ion
Forecasted Month
2-month lead time averaged over the quantiles
RLIRPFRLWEBCQM
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
MS
E S
kill
Sco
re
Forecasted Month
3-month lead time averaged over the quantiles
NBCRLIRPFRLWEBCQM
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Ski
ll S
core
for B
ias
Cor
rect
ion
Forecasted Month
3-month lead time averaged over the quantiles
RLIRPFRLWEBCQM
Figure 5.7: MSE Skill Score (left) and Skill Score for Bias Correction (right) versusforecasted month for 1, 2, and 3-month lead times, averaged over the quantiles.
82
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4 5 6 7 8 9 10 11 12
Ski
ll S
core
for B
ias
Cor
rect
ion
Lead Time (month)
May Flow
RLIRPFRLWEBCQM
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4 5 6 7 8 9 10 11 12
Ski
ll S
core
for B
ias
Cor
rect
ion
Lead Time (month)
September Flow
RLIRPFRLWEBCQM
Figure 5.8: Skill Score for Bias Correction for May and September monthly volumes,averaged over the quantiles.
(left) and MSE (right) for 1 to 3-month lead times. Without bias correction, the
forecast system has negative values and a “U” shape in the Bias. That is to say,
in absolute sense, the forecasting system without bias correction tends to under-
estimate the occurrence of the event, especially for the moderate flow events. All
the bias correction methods improve the unconditional bias (Mean Error), although
RPF issues the forecasts with relatively large bias in the middle. The Mean Er-
ror seems to keep the same magnitude and shape as the lead time increases. This
means that the Mean Error does not depend on the lead time, but on the month for
which forecasts are issued. Comparison of the MSE shown in the right of Figure 5.9
reveals that all the bias correction methods succeed in reducing the MSE, especially
for the moderate flow event.
Next, the decompositions of the MSE Skill Score (Equation (3.8)) are dis-
cussed. Figure 5.10 shows the MSE Skill Score and the potential skill (first term
in the decomposition). The MSE Skill Score by NBC shows negative scores for low
and moderate flow events, which means that the forecasts without bias correction
are worse than climatology forecasts. On the other hand, all the bias correction
methods but RLW improved the skill for the moderate and low flow events. The
potential skill for NBC for the low flow events is poorer than the other methods for
1-month lead time. It is surprising that all the bias correction methods improve the
potential skill, or squared association, in the low flow. This may be related to the
improvement in Resolution or Discrimination measures, since the other terms in the
Skill Score decompositions include conditional and unconditional measures. Note
that by definition the potential skill for any method has to be equal to or greater
83
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0 0.2 0.4 0.6 0.8 1
Mea
n E
rror
Nonexceedance probability p
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.2 0.4 0.6 0.8 1
MS
E
Nonexceedance probability p
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0 0.2 0.4 0.6 0.8 1
Mea
n E
rror
Nonexceedance probability p
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.2 0.4 0.6 0.8 1
MS
E
Nonexceedance probability p
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0 0.2 0.4 0.6 0.8 1
Mea
n E
rror
Nonexceedance probability p
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.2 0.4 0.6 0.8 1
MS
E
Nonexceedance probability p
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
Figure 5.9: Comparison of Mean Error (left) and Mean Square Error (right) byfive Bias Correction methods, actual (non bias-corrected) streamflow simulation(NBC), and pseudoperfect streamflow simulation (PSS), for 1, 2, and 3-month leadtime September monthly volume forecasts.
84
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
MS
E S
kill
Sco
re
Nonexceedance probability p
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Squ
ared
Cor
rela
tion
Nonexceedance probability
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
MS
E S
kill
Sco
re
Nonexceedance probability p
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Squ
ared
Cor
rela
tion
Nonexceedance probability
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
MS
E S
kill
Sco
re
Nonexceedance probability p
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Squ
ared
Cor
rela
tion
Nonexceedance probability
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
Figure 5.10: Comparison of MSE Skill Score (left) and measure of association (right)by five Bias Correction methods, actual (non bias-corrected) streamflow simulation(NBC), and pseudoperfect streamflow simulation (PSS), for 1, 2, and 3-month leadtime September monthly volume forecasts.
than 0 (no skill).
Figure 5.11 depicts the other terms in the decompositions of the MSE Skill
Score: a measure of conditional bias (reliability), and a relative measure of un-
conditional bias. These terms cause the forecasts without bias correction to have
less accuracy than a climatology forecast has. The conditional biases (reliability)
85
by some bias correction methods are worse than the original forecasts for low and
moderate flow events, but these contributions to Skill Score are relatively small.
As seen in the examination of Mean Error, the relative unconditional bias for the
forecasts without bias correction is also high in the moderate flow events, and all the
bias correction methods improve it dramatically. QM decreases the unconditional
bias most, while RPF has more bias in the moderate flow event among the bias
correction methods.
As for May monthly volume forecasts, the performance measures and the
decompositions of MSE Skill Score for 1-month lead time are depicted in Figure 5.12.
Compared to September monthly volume forecasts, May monthly volume forecasts
with no bias correction are less biased; they have almost the same magnitude of bias
as all the bias correction methods have. Similarly the MSEs for NBC are close to
those by the bias correction methods. Why did the bias correction methods make
relatively small improvement in the Skill Score? The reasons are that (1) the original
forecasts are well calibrated except for the extreme low flows, and overall unbiased,
and that (2) the bias correction methods could not improve the association over all
the range of quantiles, although PSS suggests the possibility of improvement in the
low and moderate flow events.
5.4.2 CR Factorization and Decompositions
The distributions s(f) and q(x = 1|f) = µx|f for threshold nonexceedance
probabilities p of 0.05, 0.25, and 0.5, with 1-month lead time are depicted in Fig-
ure 5.13. The forecasts are issued for September monthly volume. The marginal
distribution of the forecasts s(f) and conditional mean µx|f are estimated by the ker-
nel density estimation method and logistic regression, respectively (see Subsection
3.3.3).
In the case of p = 0.05, 46 observations take on 0, and just 2 take on 1. The
density of marginal distribution of the forecasts with no bias correction concentrates
near f = 0. RLW issued only f = 0, which means the variance of the observations
σ2f = 0. Since the optimal width h for kernel is calculated with σ2
f , the kernel density
estimation method failed to estimate the marginal distribution of forecasts. As the
magnitude of the threshold increases, more density of the marginal distribution of
forecasts shifts toward f = 1. QM shows almost the same distribution s(f) as PSS
does for p =0.05, 0.25, and 0.50. EBC has a flatter distribution of the forecasts for
p = 0.5, which leads to lower sharpness than the others. The results clearly show
86
00.050.1
0.150.2
0.250.3
0.350.4
0.450.5
0 0.2 0.4 0.6 0.8 1
Rel
iabi
lity
from
SS
Nonexceedance probability p
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Unc
ondi
tiona
l bia
s fro
m S
S
Nonexceedance probability p
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
00.050.1
0.150.2
0.250.3
0.350.4
0.450.5
0 0.2 0.4 0.6 0.8 1
Rel
iabi
lity
from
SS
Nonexceedance probability p
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Unc
ondi
tiona
l bia
s fro
m S
S
Nonexceedance probability p
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
00.050.1
0.150.2
0.250.3
0.350.4
0.450.5
0 0.2 0.4 0.6 0.8 1
Rel
iabi
lity
from
SS
Nonexceedance probability p
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Unc
ondi
tiona
l bia
s fro
m S
S
Nonexceedance probability p
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
Figure 5.11: Comparison of Decompositions of Skill Score by five Bias Correctionmethods, actual (non bias-corrected) streamflow simulation (NBC), and pseudop-erfect streamflow simulation (PSS), for 1, 2, and 3-month lead time Septembermonthly volume forecasts. The measure of reliability is left, and the measure ofunconditional bias is right.
87
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0 0.2 0.4 0.6 0.8 1
Mea
n E
rror
Nonexceedance probability p
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.2 0.4 0.6 0.8 1
MS
E
Nonexceedance probability p
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
MS
E S
kill
Sco
re
Nonexceedance probability p
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Squ
ared
Cor
rela
tion
Nonexceedance probability
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
00.050.1
0.150.2
0.250.3
0.350.4
0.450.5
0 0.2 0.4 0.6 0.8 1
Rel
iabi
lity
from
SS
Nonexceedance probability p
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Unc
ondi
tiona
l bia
s fro
m S
S
Nonexceedance probability p
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
Figure 5.12: Performance measures and decompositions of MSE Skill Score by fiveBias Correction methods, actual (non bias-corrected) streamflow simulation (NBC),and pseudoperfect streamflow simulation (PSS), for 1, 2, and 3-month lead time Maymonthly volume forecasts.
88
that the bias correction methods improve sharpness for some range of quantile.
As for the reliability diagrams shown in the right of Figure 5.13, the distribu-
tions of µx|f by RPF, EBC, and QM indicate a step function for p = 0.05, which
does not seem to be reasonable for µx|f ; all methods seem to have significant con-
ditional biases. Only PSS shows the positive contribution to MSE Skill Score over
all the forecasts for p = 0.05 (see Subsection 3.4 for how to read the diagram).
QM would be the second best (it has more area contributing positively to the Skill
Score). However, RPF achieves the best reliability measure for p = 0.05. Since the
measures of reliability and resolution weight area by the relative frequency s(f),
s(f) has to be considered to obtain proper insights on the measures of CR decom-
positions from the reliability diagram. For the moderate case, p = 0.25, RPF and
EBC have closer curves to y = x, which lead to the least Mean Error, discussed
later. The result for p = 0.50 shows EBC and RLW have inflection points at the
intersection of f = µx and µx|f = µx. This implies EBC and RLW have positive
contribution to the MSE Skill Score over all the forecasts, and small conditional
bias.
Next discussed are the relative measures of the Reliability and Resolution
for September monthly volume forecasts shown in Figure 5.14. For the 1-month
lead time, all the bias correction methods reduce the Relative Reliability (RREL)
(conditional bias) except at low flows. Note that the RREL remains almost the same
magnitude as lead time increases. The Relative Resolution (RRES) measures how
much different the expected observations given forecasts are from the mean of the
observations. The RRES for the bias correction methods decreases with increase
in lead time, while the RRES for the forecasts without bias correction retains a
convex down shape. Since the subtraction of RRES from RREL determines the
MSE Skill Score, the fact that the bias correction methods give almost the same
RRES as the forecasts without bias correction for 1-month lead time forecasts, along
with much less RREL, leads to the improvement in accuracy. However, in the cases
where original forecasts are already well calibrated, bias correction methods may not
improve the Skill Score. The CR decompositions of May monthly volume forecasts
illustrate the same conclusion as Skill Score decompositions (Figure B.1).
5.4.3 LBR Factorization and Decompositions
The distributions r(f |x) and t(x) for the nonexceedance probability p =0.05,
0.25, and 0.5 with 1-month lead time are depicted in Figure 5.15. The forecasts
89
0
5
10
15
20
25
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e fre
quen
cy [s
(f)]
Forecast probability (f)
s(f) for p=0.05
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Obs
erve
d re
lativ
e fre
quen
cy (µ
x|f)
Forecast probability (f)
µx|f for p=0.05
PSSNBCRLIRPFRLWEBCQM
0
2
4
6
8
10
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e fre
quen
cy [s
(f)]
Forecast probability (f)
s(f) for p=0.25
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Obs
erve
d re
lativ
e fre
quen
cy (µ
x|f)
Forecast probability (f)
µx|f for p=0.25
PSSNBCRLIRPFRLWEBCQM
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e fre
quen
cy [s
(f)]
Forecast probability (f)
s(f) for p=0.50
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Obs
erve
d re
lativ
e fre
quen
cy (µ
x|f)
Forecast probability (f)
µx|f for p=0.50
PSSNBCRLIRPFRLWEBCQM
Figure 5.13: Marginal distribution of the forecasts s(f) and the conditional meanof the forecasts µx|f by five Bias Correction methods, actual (non bias-corrected)streamflow simulation (NBC), and pseudoperfect streamflow simulation (PSS), for1-month lead time September monthly volume forecasts.
90
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
elia
bilit
y
Nonexceedance probability p
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
esol
utio
n
Nonexceedance probability p
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
elia
bilit
y
Nonexceedance probability p
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
esol
utio
n
Nonexceedance probability p
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
elia
bilit
y
Nonexceedance probability p
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
esol
utio
n
Nonexceedance probability p
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
Figure 5.14: CR decompositions by five Bias Correction methods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfect streamflow simulation(PSS), for 1, 2, and 3-month lead time September monthly volume forecasts.
91
are issued for September monthly volume. The conditional distributions r(f |x = 0)
and r(f |x = 1) are estimated with the combination of the kernel density estimation
method and logistic regression by Equations (3.40) and (3.41).
First of all, the distributions r(f |x) and t(x) are considered. The closer to
f = 1 (f = 0) the conditional mean µf |x=1 (µf |x=0) is, the smaller the Type 2
conditional bias. Examination of r(f |x = 1) for p = 0.05 reveals that RLI, RPF, and
EBC have issued the forecasts around f = 0.5, whereas NBC, PSS, and QM have
more density near f = 0. As also seen for the cases of p = 0.25 and 0.50, it is evident
that NBC issues more forecasts that are less than 0.5 for the observations x = 1.
On the other hand, NBC produces the forecasts closer to 0 for the observations
x = 0. The improvement in r(f |x = 1) rather than the degradation in r(f |x = 0)
by bias correction resulted in less Type 2 conditional bias depicted in the upper left
of Figure 5.19.
The measure of Type 2 Conditional Bias (TY2) and Discrimination (DIS) for
the probabilistic forecasts, defined in Section 3.2, can be written as
TY2 = t(x = 0)(µf |x=0 − 0)2 + t(x = 1)(µf |x=1 − 1)2 (5.10)
DIS = t(x = 0)(µf |x=0 − µf )2 + t(x = 1)(µf |x=1 − µf )
2. (5.11)
These equations indicate that Type 2 Conditional Bias measures the differences
between x = i and µf |x=i averaged with the weights t(x = i) (i=0, 1), and that
Discrimination measures the differences between µf and µf |x=i averaged with the
weights t(x = i) (i=0, 1). Figure 5.16 shows the conditional means of forecasts given
observations µf |x for all the bias correction methods, NBC and PSS. Comparison
of the distance between µf |x=i and x = i (i = 0 or 1) indicates that NBC has
more (Type 2) conditional bias given the observations x = 1 for the low flow.
Figure 5.17 compares the conditional means by EBC and QM with NBC. The bigger
Discrimination does not mean more accurate forecasts; they have to be conditionally
unbiased. The figure illustrates that the two methods, QM and EBC, are successful
in enlarging the difference between µf |x=1 and µf in the low flow, making µf close
to µx.
Next, relative measures of LBR decompositions for September and May monthly
volume forecasts are examined. Examination of Figure 5.18 indicates that the orig-
inal forecasting system has no sharpness for the extreme low flow events. As the
magnitude of the event increases, the sharpness also increases. The bias correction
92
0
5
10
15
20
25
0 0.2 0.4 0.6 0.8 1
Like
lihoo
d [r(
f|x)]
Forecast probability (f)
r(f|x=0) for p=0.05
t(x=0)=0.958
PSSNBCRLIRPFRLWEBCQM
0
5
10
15
20
25
0 0.2 0.4 0.6 0.8 1
Like
lihoo
d [r(
f|x)]
Forecast probability (f)
r(f|x=1) for p=0.05
t(x=1)=0.0417
PSSNBCRLIRPFRLWEBCQM
0
2
4
6
8
10
0 0.2 0.4 0.6 0.8 1
Like
lihoo
d [r(
f|x)]
Forecast probability (f)
r(f|x=0) for p=0.25
t(x=0)=0.771
PSSNBCRLIRPFRLWEBCQM
0
2
4
6
8
10
0 0.2 0.4 0.6 0.8 1
Like
lihoo
d [r(
f|x)]
Forecast probability (f)
r(f|x=1) for p=0.25
t(x=1)=0.229
PSSNBCRLIRPFRLWEBCQM
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 0.2 0.4 0.6 0.8 1
Like
lihoo
d [r(
f|x)]
Forecast probability (f)
r(f|x=0) for p=0.50
t(x=0)=0.521
PSSNBCRLIRPFRLWEBCQM
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 0.2 0.4 0.6 0.8 1
Like
lihoo
d [r(
f|x)]
Forecast probability (f)
r(f|x=1) for p=0.50
t(x=1)=0.479
PSSNBCRLIRPFRLWEBCQM
Figure 5.15: Conditional distributions of the forecasts r(f |x = 0) (left) and r(f |x =1) (right) by five Bias Correction methods, actual (non bias-corrected) streamflowsimulation (NBC), and pseudoperfect streamflow simulation (PSS), for 1-monthlead time September monthly volume forecasts.
93
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Pro
babi
lity
Nonexceedance probability p
µf|x=0 versus Nonexceedance Probability
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Pro
babi
lity
Nonexceedance probability p
µf|x=1 versus Nonexceedance Probability
PSSNBCRLIRPFRLWEBCQM
Figure 5.16: Conditional mean of the forecasts given the observations µf |x for fiveBias Correction methods, actual (non bias-corrected) streamflow simulation (NBC),and pseudoperfect streamflow simulation (PSS), for 1-month lead time Septembermonthly volume forecasts.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Pro
babi
lity
Nonexceedance probability p
Summary Measures versus Nonexceedance Probability
EBCNBCµx
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Pro
babi
lity
Nonexceedance probability p
Summary Measures versus Nonexceedance Probability
QMNBCµx
Figure 5.17: Conditional mean of the forecasts given the observations µf |x for EBCand QM bias correction methods with NBC. The forecasts were issued for Septembermonthly volume with 1-month lead time. The three curves for each colour in thebottom two figures show µf |x=1, µf , and µf |x=0 from top to bottom.
94
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e S
harp
ness
Nonexceedance probability p
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e S
harp
ness
Nonexceedance probability p
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e S
harp
ness
Nonexceedance probability p
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
Figure 5.18: Relative sharpness by five Bias Correction methods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfect streamflow simulation(PSS), for 1, 2, and 3-month lead time September monthly volume forecasts.
methods make the shape of sharpness more symmetrical. Relative Type 2 Condi-
tional Bias (RTY2) shows the reverse shape of sharpness and Discrimination (Figure
5.19). For example, RTY2 of the original forecasts decreases as the magnitude of the
event increases. RTY2 also maintains the shape, but not magnitude, as lead time
increases. The increases in lead time result in dramatic decreases in sharpness and
discrimination. Therefore, all the bias correction methods improved RTY2 most
for low and moderate flow events, which lead to the improvement in the MSE Skill
Score. As for May monthly volume, the LBR decompositions do not show much
improvement by any bias correction method (Figure B.2).
5.4.4 Results for All Months
To try to better see the characteristics of the bias correction methods, the per-
formance measures, CR, and LBR decompositions are calculated with verification
datasets for all months (N = 576). Since the sample size is much larger than one
95
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e Ty
pe 2
Con
ditio
nal B
ias
Nonexceedance probability p
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e D
iscr
imin
atio
n
Nonexceedance probability p
September Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e Ty
pe 2
Con
ditio
nal B
ias
Nonexceedance probability p
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e D
iscr
imin
atio
n
Nonexceedance probability p
September Flow with 2-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e Ty
pe 2
Con
ditio
nal B
ias
Nonexceedance probability p
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e D
iscr
imin
atio
n
Nonexceedance probability p
September Flow with 3-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
Figure 5.19: LBR decompositions by five Bias Correction methods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfect streamflow simulation(PSS), for 1, 2, 3-month lead time September monthly volume forecasts.
96
for each month (N = 48), estimated DO measures have smaller sample variability.
However, the assumption that all the pairs of observations and forecasts obey the
same joint distribution is likely invalid. The lead time is considered from 1, 3 and
6 months.
The right figures of Figure 5.20 indicate that the forecasts without bias cor-
rection have more unconditional bias for low and moderate flow events than high
flow events, in terms of contribution to Skill Score. All the bias correction methods
improve the unconditional bias for low and moderate flow events. According to
the left figures of Figure 5.20, after a 3-month lead time, all the bias correction
methods tend to underestimate the occurrence of the low flow events (p ≤ 0.5) and
overestimate the occurrence of the high flow events (p ≥ 0.5). This tendency seems
to stem from the bias in the hydrological model which tends to underestimate the
high streamflow volume and overestimate the low streamflow volume. Still, since
PSS with no hydrological model bias also shows the tendency, the estimation of the
forecast conditional distribution Gt(y) itself might have a problem.
The Relative Reliability (RREL) is reduced especially, in the moderate flows,
and the Relative Resolution (RRES) is maintained at the same or higher level as
original forecasts achieves (Figure 5.21). The order of RREL is almost the same for
1-month lead time through 6-month lead time. In the extreme low flow event, RPF
gives the worst Reliability among the others for 1-month lead time. It is speculated
from Figure 5.4 that the power functions does not fit low flows well. Thus, the
failure of improvement in the RREL for moderate and low flow is the main reason
why RPF obtains the worst Skill Score in this range. As the lead time increases,
the RRES by all the methods decrease.
QM has the smallest Relative Type 2 conditional bias (RTY2) and largest
Relative Discrimination (RDIS) as a whole, whereas EBC issues forecasts with rel-
atively large RTY2, and smallest RDIS (Figure 5.22). Since the RDIS and Relative
Sharpness (RS) are closely related, RS shows almost the same characteristic of the
bias correction methods as RDIS (Figure 5.23). The other regression-type methods
seem to be in between QM and EBC, although the regression-type methods tend
to have poorer sharpness in the low flow events than EBC. As for the change of
measures by lead time, the LBR decompositions shows more dramatic decreases
than the CR decompositions. Note that the forecasts that have large RS need to
have large RDIS to achieve high Skill Score. In other words, small RDIS is enough
for the forecasts with small RS to achieve the same Skill Score. This is why RLI,
97
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0 0.2 0.4 0.6 0.8 1
Mea
n E
rror
Nonexceedance probability p
All Months with 1-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.05
0.1
0.15
0.2
0 0.2 0.4 0.6 0.8 1
Unc
ondi
tiona
l bia
s fro
m S
S
Nonexceedance probability p
All Months with 1-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0 0.2 0.4 0.6 0.8 1
Mea
n E
rror
Nonexceedance probability p
All Months with 3-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.05
0.1
0.15
0.2
0 0.2 0.4 0.6 0.8 1
Unc
ondi
tiona
l bia
s fro
m S
S
Nonexceedance probability p
All Months with 3-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0 0.2 0.4 0.6 0.8 1
Mea
n E
rror
Nonexceedance probability p
All Months with 6-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.05
0.1
0.15
0.2
0 0.2 0.4 0.6 0.8 1
Unc
ondi
tiona
l bia
s fro
m S
S
Nonexceedance probability p
All Months with 6-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
Figure 5.20: Mean Error and unconditional bias from decomposition of Skill Scoreby five Bias Correction methods, actual (non bias-corrected) streamflow simulation(NBC), and pseudoperfect streamflow simulation (PSS), for all the months with 1,3, and 6-month lead times.
98
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
elia
bilit
y
Nonexceedance probability p
All Months with 1-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
esol
utio
n
Nonexceedance probability p
All Months with 1-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
elia
bilit
y
Nonexceedance probability p
All Months with 3-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
esol
utio
n
Nonexceedance probability p
All Months with 3-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
elia
bilit
y
Nonexceedance probability p
All Months with 6-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e R
esol
utio
n
Nonexceedance probability p
All Months with 6-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
Figure 5.21: CR decompositions by five Bias Correction methods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfect streamflow simulation(PSS), for all the months with 1, 3, and 6-month lead times.
99
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e Ty
pe 2
Con
ditio
nal B
ias
Nonexceedance probability p
All Months with 1-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e D
iscr
imin
atio
n
Nonexceedance probability p
All Months with 1-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e Ty
pe 2
Con
ditio
nal B
ias
Nonexceedance probability p
All Months with 3-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e D
iscr
imin
atio
n
Nonexceedance probability p
All Months with 3-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e Ty
pe 2
Con
ditio
nal B
ias
Nonexceedance probability p
All Months with 6-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e D
iscr
imin
atio
n
Nonexceedance probability p
All Months with 6-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
Figure 5.22: LBR decompositions by five Bias Correction methods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfect streamflow simulation(PSS), for all the months with 1, 3, and 6-month lead times.
EBC, and QM have obtained almost the same Skill Score (Figure 5.24).
For this experimental forecasting system, it is clear that using the bias cor-
rection methods is better than doing nothing (Figure 5.24).The exception is the
RPF, which produces poorer skill than NBC in low flow events. For the 1-month
lead time, RLI gives the best accuracy, and the second best one is EBC. After the
2-month lead time, QM becomes the best correction method in terms of accuracy,
100
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e S
harp
ness
Nonexceedance probability p
All Months with 1-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e S
harp
ness
Nonexceedance probability p
All Months with 3-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e S
harp
ness
Nonexceedance probability p
All Months with 6-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
Figure 5.23: Relative sharpness by five Bias Correction methods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfect streamflow simulation(PSS), for all the months with 1, 3, and 6-month lead times.
101
and RLW and RLI show about the same accuracy. On the other hand, RPF keeps
giving less accuracy than NBC in the low flow. Among the regression-type meth-
ods, RLI is the best one. This might be because RLI has some favorable features
that take into account the multiplicative bias for the meteorological event and the
magnitude of simulated volume. In addition, RLI is much easier to implement than
the other two methods, RLW and RPF. Note that the MSE Skill Score drops down
at the extreme high and low quantiles. Another important point is that the fore-
casts for low flow events tend to be more accurate than high flow event. The Skill
Score for 1-month lead time with PSS indicates the room for further improvement
achievable by bias correction methods. It should be noted that the potential skill
is also improved by the bias correction methods.
5.5 Summary and Conclusions
Three types of bias correction methods were applied to the ensemble volumes
produced by the experimental forecasting system. The first one is the Event-Bias
Correction method (EBC). The multiplicative bias between observed volume and
the simulation with the historical meteorological input is first obtained. This multi-
plicative bias is then used to correct the ensemble volume simulated with the same
historical meteorological event. The second type is a Regression method, including
the linear interpolation between corresponding values of the historical simulation
and observation (RLI), and power function (RPF) and LOWESS (locally weighted
scatterploting smoothing) (RLW) regressions, which were fitted to the scatter plot
between observed flow and historical simulation. The ensemble volume is replaced
by the expected observed volume given the ensemble simulated volume. The third
method is the Quantile Mapping method (QM). The ensemble volumes are corrected
by the same cumulative relative frequency between observed flow and historical sim-
ulation.
In the investigation, some problems were found with continuous approaches
LRM and KDM. In LRM, the kernel density estimation method is used to estimate
the marginal distribution of forecasts s(f). However, in situations where σf = 0 the
optimum bandwidth cannot be estimated, which led to the failure in estimation of
s(f). This occurs for extreme low or high quantiles with small sample sizes. KDM
also has the same problem in estimating the conditional distribution of forecasts
given observations r(f |x). Another problem is that the logistic regression may
produce unreasonable estimates of the conditional mean µx|f for forecasts of extreme
102
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
MS
E S
kill
Sco
re
Nonexceedance probability p
All Months with 1-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Squ
ared
Cor
rela
tion
Nonexceedance probability p
All Months with 1-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
MS
E S
kill
Sco
re
Nonexceedance probability p
All Months with 3-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Squ
ared
Cor
rela
tion
Nonexceedance probability p
All Months with 3-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
MS
E S
kill
Sco
re
Nonexceedance probability p
All Months with 6-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Squ
ared
Cor
rela
tion
Nonexceedance probability p
All Months with 6-month Lead Time (N=576)
PSSNBCRLIRPFRLWEBCQM
Figure 5.24: MSE Skill Score and potential skill by five Bias Correction meth-ods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfectstreamflow simulation (PSS), for all the months with 1, 3, and 6-month lead times.
103
low or high flow events. The pseudoperfect streamflow simulation (PSS) was utilized
to see the forecast quality without no hydrological model bias. It also showed the
tendency to overestimate the occurrence of low flow event, and underestimate the
occurrence of high flow events. Hence, the unconditional biases may stem from
Gt(y).
Examination of the MSE Skill Score for the forecasts without bias correction
indicates the monthly variation. However, the MSE Skill Score for the historical
simulation showed a different monthly variation from the probabilistic forecasts.
Therefore, it is dangerous to simply speculate that months with high relative accu-
racy in the historical simulation have relative high relative accuracy in the proba-
bilistic forecasts.
The DO measures vary uniquely as lead time increases. The Type 2 condi-
tional bias increases more rapidly than the Bias and Reliability (Type 1 conditional
bias), which are fairly constant as lead time increases. The Discrimination decreases
more dramatically than the Resolution as lead time increases. The association ρ2fx
for the forecasts with longer lead time is less affected by the bias in the hydrological
model.
The characteristics of the bias correction methods were investigated. The de-
compositions of the Skill Score reveal that all the bias correction methods achieve
better skill score mostly by reducing the conditional bias (Reliability) and uncon-
ditional bias (Mean Error). Therefore, if the model is well calibrated, not much
improvement can be obtained. Surprisingly, in some cases bias correction methods
also improve the association. The reduction of the Reliability and maintenance of
the Resolution is the one of the characteristics of the bias correction methods. As
for LBR decompositions, the improvement of Skill Score by the bias corrections is
achieved mostly by reducing the Type 2 conditional bias. From estimated distri-
bution s(f) and relative sharpness, it can be seen that the bias correction methods
shift the density from 0 toward the middle, but this does not always appear over
all the thresholds. The clearest characteristic of the bias correction methods is that
the ensembles corrected by EBC gives the lowest sharpness over all the quantiles,
whereas QM gives as high the sharpness and discrimination. The regression-type
methods seem to be in between of these. RLI gives the best improvement in ac-
curacy for a 1-month lead time, while after a 2-month lead time, QM method has
the best accuracy. RPF gives lower accuracy than original ensembles for the low
flow quantile. RLW and RLI produce almost the same accuracy after a 2-month
104
lead time. Hence, it is clear that the DO approach is a useful, sound framework to
assess bias correction methods for probabilistic streamflow forecasts.
105
CHAPTER 6
SUMMARY AND CONCLUSIONS
One objective of this research is to extend the Distributions-Oriented (DO)
approach to the verification of probability distribution forecasts of streamflow.
The Advanced Hydrologic Prediction Services (AHPS) forecasts from the National
Weather Service (NWS) utilize the idea of Extended Streamflow Prediction (ESP).
First, the hydrological model embedded in the forecasting system produces ensem-
ble traces (or different realizations in the future) of streamflow by inputting histor-
ical meteorological information. Then, statistical analysis of the ensemble volumes
produces probability distribution forecasts. Verifying the probability distribution
forecasts is a problem, since they contain a probabilistic forecast for any possible
outcome. One solution was proposed. Consider a discrete event that a forecast
variable is less than or equal to a threshold. By setting up the threshold, one prob-
abilistic forecast is derived from the probability distribution forecast in terms of
nonexceedance probability. The corresponding continuous observation is converted
into 0 (no occurrence) or 1 (occurrence). This pair of probabilistic forecast and dis-
crete observation becomes part of the verification dataset. Many sets of verification
datasets are obtained by setting up many thresholds. Investigation of this set of
verification datasets was considered equivalent to examination of forecast quality of
the probability distribution forecast.
6.1 Distributions-Oriented Methodsfor Small Verification Dataset
The verification of streamflow forecasts suffers from its small sample size.
Since applying the DO approach is equivalent to estimation of the joint distribu-
tion of forecasts and observations, actual implementation of the DO approach to
streamflow forecasts faces serious estimation problems related to small samples.
The difficulty in estimation is expressed by the dimensionality D, which is defined
as the number of degrees of freedom in order to estimate the joint distribution of
forecasts and observations. For instance, if probabilistic forecasts are issued from 0
to 1 with 0.1 interval for dichotomous observations, D is equal to 11× 2− 1 = 21.
106
However, in case of May-September seasonal volume forecasts the sample size to
estimate the joint distribution could be around 50, because many gaging stations
have a short period of record, and events occur once a year. In order to reduce
the dimensionality D, a continuous approach was introduced. All the measures
except for CR decompositions were derived from six basic statistics; CR decompo-
sitions required estimation of the integral∫ 10 µ2
x|fs(f)df , where µx|f is the conditional
mean of observations given forecasts, and s(f) denotes the marginal distribution of
forecasts. Three methods LRM, KDM, and CM were considered to estimate the
integral; LRM uses the arithmetic mean with logistic regression, KDM uses the
numerical integration with kernel density estimation (nonparametric method), and
CM uses numerical integration with logistic regression and kernel density estima-
tion. For a 11-bin contingency-table model (discrete approach, DSC), the continuous
approaches reduce the dimensionality by about one-third.
Three Monte Carlo experiments were carried out to investigate the three con-
tinuous approaches. One experiment used an analytical model for the joint dis-
tribution. Another used a stochastic model of the streamflow forecasting system.
The third used a discrete joint distribution model. Verification datasets were gen-
erated to see how the estimated measures of forecast quality vary with the number
of forecast-observation pairs. The number of the pairs was varied from 50 to 1000.
It turned out that the continuous approach with LRM for small samples is the best
estimator for CR decompositions, whether the forecasts are issued in discrete or
continuous numbers. LRM is the best estimator in the case of the forecasts issued
for extreme events with small sample size. KDM is also a better estimator than DSC
for small sample, and works better than LRM for the forecasts issued for moderate
events. A reason for the improvement over DSC is that the continuous approaches
impose some structure on the estimation of marginal distribution of forecasts or
conditional distribution of forecasts given observations.
6.2 Assessment of Bias Correction Methodsfor Ensemble Forecasts
The second objective of this research is to demonstrate the usefulness of the
DO approach in assessing the quality of streamflow forecasts. The forecast of inter-
est is the probabilistic forecast for monthly streamflow volume observed at Stratford
on the Des Moines River. Three different types of bias correction methods are ap-
plied to the ensemble volumes. Event-Bias Correction method (EBC) expects an
107
ensemble volume simulated with i-th year’s meteorological conditions to have the
same bias as the i-th year’s historical simulation has, regardless of the magnitude
of the ensemble volume. A Regression method replaces an ensemble volume with
the expected observation given the ensemble volume. The regression is obtained
from observations and corresponding historical simulations. As forms of the re-
gression, Linear Interpolation (RLI), Power Function (RPF), and LOWESS (RLW)
are considered. The Quantile-mapping method (QM) corrects an ensemble volume
based on the cumulative distributions of observations and historical simulations,
so that the ensemble volume and corrected volume have the same nonexceedance
probability.
In the investigation, the major characteristic common in the bias correction
methods was shown by the decompositions of the Skill Score: all the bias correction
methods achieve better skill scores mostly by reducing the conditional bias (Reliabil-
ity) and unconditional bias (Mean Error). Therefore, if the model is well calibrated,
not much improvement may be obtained. It is remarkable that in some cases bias
correction methods also improve the association. The reduction of the Reliability
and maintenance of the Resolution is another characteristic of the bias correction
methods. As for LBR decompositions, the improvement of Skill Score by the bias
corrections is achieved mostly by reducing the Type 2 conditional bias. The distinct
characteristic of the bias correction methods is that the forecasts modified by EBC
tend to have the lowest sharpness and discrimination over all the quantile, whereas
QM tends to give the highest sharpness and discrimination. The regression-type
methods seem to be in between of these. Thus, the DO approach enabled us to
obtain insights on the forecasts produced by various bias correction methods. Some
problems were found with the continuous approaches LRM and KDM, when the
forecasts were issued for extreme low or high quantiles with small sample size. For
example, a event that a monthly volume is equal or less than the 0.05 quantile
does not occur 47 or 48 times out of 50 samples. Then, forecasting systems with
moderate skill could issue probability 0 for 49 or all the events. In the cases, the
logistic regression and kernel density estimation cannot estimate µx|f properly.
6.3 Future Study and Remarks
The first objective of this research was to extend the DO approach to the veri-
fication of probability distribution forecasts. We suggested recoding the probability
distribution forecasts by setting up many thresholds. Another possibility to assess
108
the probability distribution forecasts is to utilize the continuous ranked probability
score (CRPS), which is the integral of the Brier score over all possible threshold
values (Hersbach, 2000). However, since the CRPS is just one scalar variable, the
information on how forecast quality varies over the range of possible outcomes may
not be obtained. Although the CR decomposition has been derived by Hersbach
(2000), another decomposition corresponding to LBR decomposition of the Brier
score should be also derived in order to look at forecast quality given observations.
The new score proposed by Wilson et al. (1999), which is in the form of probabili-
ties of occurrence of the observation given the EPS distribution, would be useful to
see how the bias correction methods or incorporation of climatological forecasting
change the distribution of ensemble volumes.
One of the important points in application of verification framework with DO
approach is that DO approach assumes the stationarity of streamflow and forecasts,
and no serial correlations in streamflow time series and in forecast time series.
The joint distribution includes all of the nontime-dependent information relevant to
forecast verification (Murphy, 1991). In other words, this framework does not have
the ability to measure the evolution of aspects of forecast measure due to change in
time. This point could lead to the expansion of DO approach. For example, in the
cases where the observations are not stationary, the observations might be divided
into stationary groups to apply the DO approach. If the daily streamflow volume
forecast is considered, the serial correlation cannot be ignored. However, even if
we succeeded in detecting nonstationarity or serial correlation, another big problem
would be waiting. A more complex model of the relationship between forecasts
and observations would have even higher dimensionality than the joint distribution
model studied.
The problem of boundary effects with kernel density estimation method is
still under research, and this work did not utilize the best technique that is now
available. However, it is obvious that the kernel density estimation method has more
flexibility than logistic regression. Moreover, one can avoid the error introduced
by choosing improper distribution functions. Innovations in techniques for kernel
density estimation should be followed and adapted into the forecast verification
methodology.
This research used an experimental forecasting system that utilizes ensemble
volumes produced with all the meteorological information, regardless of forecast
109
date. In reality, ESP can only utilize the meteorological information recorded be-
fore the forecast date to produce the ensemble traces. The effect of the number of
available ensemble traces on the skill of the forecasting system should be investi-
gated.
In the assessment of bias correction methods, the MSE Skill Score for the fore-
casts before bias correction was examined over all the months. In addition, the his-
torical simulation and the observations (used to develop bias-correction functions)
were compared to estimate the MSE Skill Score for the simulation period. Note
that the MSE Skill Score for the probabilistic forecasts showed monthly variations
that were different from those of the MSE Skill Score for the historical simulation.
One possible area of investigation would be to use the DO approach to examine the
joint distribution of historical simulations and observations more closely. Further
comparisons may indicate how the quality of the model predictions are related to
forecast quality of the probabilistic forecasts from ESP.
How best to incorporate the climate forecast information into ESP is another
issue to be studied. In this research, equal weighting of the ensemble traces was
used; no climate forecast information was utilized. Although the use of climate
information has been investigated (e.g., Croley II, 2000, and Perica, 1998), it is not
clear that these methods for incorporating weather or climate forecasts are really
beneficial or effective to improve streamflow forecasts. As was demonstrated for
bias-correction methods, application of the extended DO approach would be use-
ful for evaluating alternate approaches for using climate information in streamflow
forecasting.
110
APPENDIX A
STATISTICAL METHODS
This appendix describes the three statistical methods (LRM, KDM, and CM)
for estimating the integral:∫ 1
0µ2
x|fs(f)df.
These methods reduce dimensionality of a verification problem. The traditional
discrete approach with a contingency table (DSC) is also described.
A.1 Logistic Regression Method
The logistic regression for one explanatory variable, probabilistic forecast f ,
is expressed with two parameters β0 and β1 as:
ln(
π
1− π
)= β0 + β1f (A.1)
where π is the probability that discretized observation X is equal to 1. Estimation
of the two parameters is as follows. Since the probability that X is equal to 0 or 1
can be written as
Pr(X = x) =exp{x(β0 + β1f)}1 + exp(β0 + β1f)
(A.2)
x = 0 or 1,
the likelihood function can be formulated as
Pr(X1 = x1, X2 = x2, · · · , XN = xN) =exp{∑N
i=1 xi(β0 + β1fi)}ΠN
i=1{1 + exp(β0 + β1fi)} . (A.3)
Then, the logarithm of the above equation is derived:
F (β0, β1) = ln Pr
=N∑
i=1
[xi(β0 + β1fi)− ln{1 + exp(β0 + β1fi)}]. (A.4)
In order to maximize the logarithm likelihood function, the Levenberg-Marquardt
optimization method is used to minimize −F .
The estimator of the conditional mean µx|f for a sample i can be written with
111
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Obs
erva
tions
Forecasts
observationlogistic regression
Figure A.1: Example of logistic regression applied to the pairs of forecasts andobservations.
the two parameters:
µx|fi= Pr(Oi = 1)
=exp(β0 + β1fi)
1 + exp(β0 + β1fi). (A.5)
Figure A.1 illustrates an example of logistic regression applied to the pairs of fore-
casts and observations generated by the analytical model for the joint distribution
developed in Section 4.2. The estimates of µ2x|fi
corresponding to each forecast fi are
obtained, and the sample average (Equation (3.36)) is used to estimate the integral.
Since the marginal distribution of the forecasts s(f) is graphically informative, it
is estimated directly by the kernel density estimation method explained later on.
Therefore, the dimensionality by LRM is D = 9 since 6 basic statistics, 2 param-
eters for logistic regression, and 1 parameter for kernel density estimation method
have to be estimated.
A.2 Kernel Density Estimation Method
The basic kernel density estimator is written as
f(x) =1
nh
n∑
i=1
K(
x− xi
h
)(A.6)
where h denotes the bandwidth, n is the number of samples, and K() is a kernel.
The biweight kernel is utilized in this research:
K(t) =15
16(1− t2)2 (A.7)
112
0
1
2
3
4
5
6
7
-1 -0.5 0 0.5 1 1.5 2
PD
F
Probablistic Forecast
r(f|x=0) by kernel estimation
0
0.5
1
1.5
2
2.5
3
3.5
-1 -0.5 0 0.5 1 1.5 2
PD
F
Probablistic Forecast
r(f|x=1) by kernel estimation
0
0.2
0.4
0.6
0.8
1
-1 -0.5 0 0.5 1 1.5 2
CD
F
Probablistic Forecast
r(f|x=0) by kernel estimation
0
0.2
0.4
0.6
0.8
1
-1 -0.5 0 0.5 1 1.5 2
CD
F
Probablistic Forecast
r(f|x=1) by kernel estimation
Figure A.2: Unbounded estimation with biweight kernel.
How to obtain optimum bandwidth is described in the following. Equivalent
bandwidth scaling is often used to convert the bandwidth obtained from a Normal
kernel cross-validation rule into the one for the another kernel (Scott 1992, p142).
Here, the bandwidth is determined through a Normal reference rule that minimizes
asymptotic mean integrated error (AMISE) between the normal distribution and
the normal kernel estimate:
h∗ = (4/3)1/5σn−1/5. (A.8)
Finally, h∗ is multiplied by the equivalent bandwidth scaling factor for a biweight
kernel, 2.623, to produce h.
As an example, fifty pairs of forecasts and observations, produced by the
stochastic model of streamflow forecast system explained in Section 4.3, are used
for kernel density estimation. Figure A.2 shows the kernel density estimation with-
out any consideration of the boundaries f = 0 and 1. The difficulty using kernel
estimation rises from the boundary of f at 0 and 1; the estimate for x = ch or
113
x = 1 − ch, 0 ≤ c < 1, is not necessarily a consistent estimate of f(x). This is
known as the boundary effect. Some well-known methods to deal with this problem
are the reflection method, boundary kernel method, boundary kernel method im-
plicit in local linear fitting, transformation method, and pseudodata method (e.g.,
see Zhang et al. 1999). In the following, two simple methods out of the above ones
are examined.
First, the boundary kernel method is applied. The kernel is designed from
the biweight kernel to lead to the elimination of the O(h) in the bias (Scott 1992,
p146):
Kc(t) =3
4
[(c + 1)− 5
4(1 + 2c)(t− c)2
][t− (c + 2)]2I[c,c+2](t). (A.9)
In the range of xi ∈ [0, h) the kernel Kc with c = (0 − xi)/h is used instead of the
ordinary biweight kernel. As for the other boundary at x = 1, measure the distance
from x = 1 for the samples xi ∈ (1 − h, 1] to the negative direction and apply the
above kernel to the samples with the remeasured distance. However, this kernel can
produce the negative values as shown in Figure A.3, which is known as a drawback
of this method.
The second method is the reflection method. This method literally reflects
original samples against a boundary, and then applies the ordinary kernel to the
reflected samples. The final estimation is obtained by tripling the estimates for
0 ≤ x ≤ 1, since the original samples have been tripled (see Scott 1992). Figure
A.4 illustrates the limitation of this method; this method is specially designed for
the case where the first derivative of f at a boundary is 0.
Zhang (1999) has proposed a new method for modifying the boundary effect,
combining methods of pseudodata, transformation, and reflection. The kernel den-
sity estimation method is still under development. Therefore, it is reasonable enough
for this research to use the simple reflection method in order to illustrate the use-
fulness of kernel density estimation method in the verification problem, specifically
reduction of dimensionality.
The estimated conditional distribution r(f |x)), the marginal distribution s(f),
and conditional mean µx|f are connected by Equations (3.37) and (3.39). The
numerical integral form of Equation (3.34) can be given as:∫ 1
0µ2
x|fs(f)df ≈ ∑ 1
2
{µ2
x|f=ts(f = t) + µ2x|f=t+∆ts(f = t + ∆)
}∆t. (A.10)
114
-5
0
5
10
15
20
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
PD
F
Probablistic Forecast
r(f|x=0) by kernel estimation
-1
0
1
2
3
4
5
6
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
PD
F
Probablistic Forecast
r(f|x=1) by kernel estimation
0
0.2
0.4
0.6
0.8
1
1.2
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
CD
F
Probablistic Forecast
r(f|x=0) by kernel estimation
00.10.20.30.40.50.60.70.80.9
1
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
CD
F
Probablistic Forecast
r(f|x=1) by kernel estimation
Figure A.3: Bounded estimation with floating boundary kernel.
115
0
2
4
6
8
10
12
14
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
PD
F
Probablistic Forecast
r(f|x=0) by kernel estimation
0
0.5
1
1.5
2
2.5
3
3.5
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
PD
FProbablistic Forecast
r(f|x=1) by kernel estimation
0
2
4
6
8
10
12
14
0 0.2 0.4 0.6 0.8 1
PD
F
Probablistic Forecast
r(f|x=0) by kernel estimation
0
0.5
1
1.5
2
2.5
3
3.5
0 0.2 0.4 0.6 0.8 1
PD
F
Probablistic Forecast
r(f|x=1) by kernel estimation
00.10.20.30.40.50.60.70.80.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CD
F
Probablistic Forecast
r(f|x=0) by kernel estimation
00.10.20.30.40.50.60.70.80.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CD
F
Probablistic Forecast
r(f|x=1) by kernel estimation
Figure A.4: Bounded estimation with biweight kernel and reflection boundary tech-nique.
116
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 0.2 0.4 0.6 0.8 1
s(f)
Probablistic Forecast
s(f) with sample size 50
Figure A.5: Example of kernel density estimation method applied to forecasts toestimate the marginal distribution s(f).
where t is the parameter that takes on the value from 0.0 to 0.999 with the incre-
ment ∆t = 0.001. This method has the dimensionality D = 7 since kernel density
estimation method has one parameter, bandwidth.
A.3 Combination Method
The marginal distribution of the forecasts s(f) is estimated directly by the
kernel density estimation method in the same manner as the one for the condi-
tional distribution r(f |x). Figure A.5 shows an example of s(f) estimated with
the forecasts generated by the analytical model for joint distribution in Section 4.2.
With µx|f estimated by the logistic regression, the integral of CR decompositions
is numerically estimated by Equation (A.10). The dimensionality of this method is
D = 9, which is the same as LRM.
A.4 Contingency Table Approach
The forecasts originally issued in continuous numbers are converted into 11
discrete numbers: {f | 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} by rounding
up numbers to the nearest discrete value. In the case of the analysis with the
discrete joint distribution in Section 4.4, one more discrete number 0.05 is added
to correspond to the 12 original discrete numbers of forecasts. Thus, the joint
distribution has 11 (or 12)×2 probabilities to be estimated; the dimensionality is
D = 11 (or 12) × 2 − 1 = 21(or 23). All the measures of forecast quality are
calculated from joint relative frequency based on their definitions.
117
APPENDIX B
SELECTED FIGURES AND TABLES
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
iabi
lity
Nonexceedance probability p
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Res
olut
ion
Nonexceedance probability p
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
Figure B.1: CR decompositions by five Bias Correction methods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfect streamflow simulation(PSS), for 1-month lead time May monthly volume forecasts.
118
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Type
2 C
ondi
tiona
l Bia
s
Nonexceedance probability p
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Dis
crim
inat
ion
Nonexceedance probability p
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Rel
ativ
e S
harp
ness
Nonexceedance probability p
May Flow with 1-month Lead Time (N=48)
PSSNBCRLIRPFRLWEBCQM
Figure B.2: LBR decompositions by five Bias Correction methods, actual (non bias-corrected) streamflow simulation (NBC), and pseudoperfect streamflow simulation(PSS), for 1-month lead time May monthly volume forecasts.
119
Table B.1: BIAS in REL/σ2x for the forecasts generated for nonexceedance proba-
bility p = 0.25 by the analytical model of joint distribution.
Size DSC LRM KDM CM
50 9.172e-002 4.930e-002 −1.273e-002 1.122e-001
100 4.381e-002 2.699e-002 −2.235e-002 7.343e-002
200 2.203e-002 2.071e-002 -2.074e-002 5.426e-002
400 9.793e-003 1.552e-002 -1.892e-002 4.019e-002
600 4.612e-003 1.293e-002 -1.830e-002 3.371e-002
800 3.413e-003 1.289e-002 -1.622e-002 3.116e-002
1000 2.596e-003 1.269e-002 -1.476e-002 2.943e-002
Note: the underlined value is the closest to 0 in the row.
Table B.2: Standard Deviation in REL/σ2x for the forecasts generated for nonex-
ceedance probability p = 0.25 by the analytical model of joint distribution.
Size DSC LRM KDM CM
50 6.459e-002 6.378e-002 5.406e-002 8.395e-002
100 4.227e-002 4.051e-002 3.763e-002 4.843e-002
200 3.098e-002 2.776e-002 2.723e-002 3.055e-002
400 2.275e-002 2.037e-002 2.098e-002 2.109e-002
600 1.753e-002 1.573e-002 1.629e-002 1.643e-002
800 1.475e-002 1.340e-002 1.394e-002 1.377e-002
1000 1.405e-002 1.257e-002 1.318e-002 1.283e-002
Note: the underlined value is the smallest in the row.
120
Table B.3: BIAS in RES/σ2x for the forecasts generated for nonexceedance proba-
bility p = 0.25 by the analytical model of joint distribution.
Size DSC LRM KDM CM
50 6.917e-002 3.485e-002 −2.718e-002 9.772e-002
100 3.285e-002 2.171e-002 -2.764e-002 6.814e-002
200 1.585e-002 1.991e-002 -2.153e-002 5.346e-002
400 4.027e-003 1.448e-002 -1.996e-002 3.915e-002
600 -3.946e-003 9.323e-003 −2.191e-002 3.011e-002
800 -3.037e-003 1.106e-002 -1.805e-002 2.933e-002
1000 -4.020e-003 1.056e-002 -1.689e-002 2.731e-002
Note: the underlined value is the closest to 0 in the row.
Table B.4: Standard Deviation in RES/σ2x for the forecasts generated for nonex-
ceedance probability p = 0.25 by the analytical model of joint distribution.
Size DSC LRM KDM CM
50 1.653e-001 1.727e-001 1.609e-001 1.851e-001
100 1.138e-001 1.165e-001 1.121e-001 1.207e-001
200 8.534e-002 8.430e-002 8.267e-002 8.506e-002
400 6.129e-002 5.981e-002 5.961e-002 5.998e-002
600 5.097e-002 5.030e-002 4.996e-002 5.056e-002
800 4.209e-002 4.174e-002 4.159e-002 4.181e-002
1000 3.969e-002 3.866e-002 3.876e-002 3.865e-002
Note: the underlined value is the smallest in the row.
121
Table B.5: BIAS in REL/σ2x for the forecasts generated for nonexceedance proba-
bility p = 0.05 by the analytical model of joint distribution.
Size DSC LRM KDM CM
50 2.166e-001 1.260e-001 1.711e-001 5.536e-001
100 1.477e-001 7.242e-002 1.436e-001 1.303e-001
200 9.639e-002 4.265e-002 1.041e-001 5.722e-002
400 5.795e-002 2.948e-002 7.235e-002 3.979e-002
600 4.206e-002 2.473e-002 5.860e-002 3.341e-002
800 3.410e-002 2.314e-002 5.183e-002 3.084e-002
1000 2.793e-002 2.163e-002 4.540e-002 2.849e-002
Note: the underlined value is the closest to 0 in the row.
Table B.6: Standard Deviation in REL/σ2x for the forecasts generated for nonex-
ceedance probability p = 0.05 by the analytical model of joint distribution.
Size DSC LRM KDM CM
50 1.462e-001 1.087e-001 1.732e-001 1.371e+000
100 8.445e-002 5.513e-002 8.784e-002 4.518e-001
200 5.068e-002 2.886e-002 4.715e-002 3.296e-002
400 3.087e-002 1.593e-002 2.845e-002 1.765e-002
600 2.339e-002 1.141e-002 2.021e-002 1.248e-002
800 1.927e-002 9.468e-003 1.690e-002 1.036e-002
1000 1.698e-002 7.810e-003 1.507e-002 8.339e-003
Note: the underlined value is the smallest in the row.
122
Table B.7: BIAS in RES/σ2x for the forecasts generated for nonexceedance proba-
bility p = 0.05 by the analytical model of joint distribution.
Size DSC LRM KDM CM
50 1.944e-001 1.136e-001 1.587e-001 5.412e-001
100 1.240e-001 5.738e-002 1.286e-001 1.153e-001
200 8.001e-002 3.608e-002 9.749e-002 5.066e-002
400 4.559e-002 2.720e-002 7.008e-002 3.752e-002
600 2.871e-002 2.241e-002 5.628e-002 3.109e-002
800 2.338e-002 2.363e-002 5.232e-002 3.133e-002
1000 1.468e-002 1.885e-002 4.263e-002 2.572e-002
Note: the underlined value is the closest to 0 in the row.
Table B.8: Standard Deviation in RES/σ2x for the forecasts generated for nonex-
ceedance probability p = 0.05 by the analytical model of joint distribution.
Size DSC LRM KDM CM
50 3.638e-001 3.334e-001 3.711e-001 1.334e+000
100 2.403e-001 2.182e-001 2.396e-001 4.789e-001
200 1.561e-001 1.538e-001 1.611e-001 1.609e-001
400 1.038e-001 1.044e-001 1.071e-001 1.078e-001
600 8.397e-002 8.516e-002 8.658e-002 8.740e-002
800 7.484e-002 7.777e-002 7.722e-002 7.957e-002
1000 6.561e-002 6.689e-002 6.688e-002 6.824e-002
Note: the underlined value is the smallest in the row.
123
REFERENCES
Anderson, Jeffrey L., A Method for Producing and Evaluating Probabilistic Fore-casts from Ensemble Model Integrations, Journal of Climate, 9 (7), 1518–1530,1996.
Bae, Deg Hyo and Konstantine P. Georgakakos, Hydrologic Modeling for FlowForecasting and Climate Studies in Large Drainage Basins, IIHR Report No.360, The University of Iowa, Iowa City, Iowa, 1992.
Bicknell, B.R., J. C. Imhoff, J. L. Kittle, Jr., A. S. Donigian, Jr., and R. C. Jo-hanson, Hydrological Simulation Program–Fortran: User’s manual for version11, U.S. Environmental Protection Agency, National Exposure Research Lab-oratory, Athens, Georgia, 1997.
Bradley, A. A. and S. S. Schwartz, Evaluating the Impact of Climate Forecast In-formation on Probabilistic Streamflow Forecast, EOS Transactions, AmericanGeophysical Union, 81 Supplement, Washington D. C., May, 2000; AbstractH32B-09.
Buizza, R. and T. N. Palmer, Impact of ensemble size on ensemble prediction,Monthly Weather Review, 126 (9), 2503–2518, 1998.
Cleveland, W.S., Robust Locally Weighted Regression and smoothing Scatterplots,Journal of the American Statistical Association, 74 (12), 829–836, 1979.
Croley II, Thomas E. Using Meteorology Probability Forecasts in Operational Hy-drology; American Society of Civil Engineers: Reston, Virginia, 2000.
Day, G. N., Extended Streamflow Forecasting Using NWSRFS, Journal of WaterResources Planning and Management, 111 (2), 157–170, 1985.
Donigian, A.S., Jr., J. C. Imhoff, Brian Bicknell and J. L. Kittle, Jr., Applica-tion guide for Hydrological Simulation Program–Fortran (HSPF), U.S. Envi-ronmental Protection Agency, Environmental Research Laboratory, Athens,Georgia, 1984.
Doswell III, Charles A., Robert Davies-Jones and David L. Keller, On summaryMeasures of Skill in Rare Event Forecasting Based on Contingency Tables,Weather and Forecasting, 5 (12), 576–585, 1990.
Hamill, Thomas M. and Stephen J. Colucci, Verification of Eta-RSM Short-RangeEnsemble Forecasts, Monthly Weather Review, 125 (6), 1312–1327, 1997.
Helsel, D. R. and R. M. Hirsch Statistical Methods in Water Resources; Elsevier:New York, 1992.
124
Hersbach, Hans, Decomposition of the continuous ranked probability score forensemble prediction systems, Weather and Forecasting, 15 (5), 559–570, 2000.
Hou, Dingchen, Eugenia Kalnay and Kelvin K. Droegemeier, Objective Verificationof the SAMEX ’98 Ensemble Forecasts, Monthly Weather Review, 129 (1), 73–91, 2001.
Kottegoda, N. T. and R. Rosso Statistics, Probability, and Reliability for Civiland Environmental Engineers; McGraw-Hill, Inc.: New York, 1997.
Marzban, Caren, Scalar Measures of Performance in Rare-Event Situations, Weatherand Forecasting, 13 (9), 753–763, 1998.
Murphy, Allan H., Forecast verification, Economic Value of Weather and ClimateForecasts, Katz, Richard W. and Allan H. Murphy, editors; Cambridge Uni-versity Press: New York, 19–74, 1997.
Murphy, Allan H., Forecast verification: its complexity and dimensionality, MonthlyWeather Review, 119 (7), 1590–1601, 1991.
Murphy, Allan H., Skill scores based on the mean square error and their rela-tionships to the correlation coefficient, Monthly Weather Review, 116 (12),2417–2424, 1988.
Murphy, Allan H., What is a good forecast? An essay on the nature of goodnessin weather forecasting. Weather and Forecasting, 8 (2), 281–293, 1993.
Murphy, Allan H. and E. S. Epstein, Skill scores and correlation coefficients inmodel verification, Monthly Weather Review, 117 (3), 572–581, 1989.
Murphy, Allan H. and Daniel S. Wilks, A Case Study of the Use of StatisticalModels in Forecast Verification: Precipitation Probability Forecasts, Weatherand Forecasting, 13 (3), 795–810, 1998.
Murphy, Allan H. and Robert L. Winkler, Diagnostic verification of probabilityforecasts, International Journal of Forecasting, 7, 435–455, 1992.
Murphy, Allan H. and Robert L. Winkler, A General Framework for ForecastVerification, Monthly Weather Review, 115 (7), 1330–1338, 1987.
Perica, Sanja, Integration of Meteorological Forecasts/Climate Outlooks Into Ex-tended Streamflow Prediction (ESP) System, http://www.nws.noaa.gov/oh/hrl/papers/ams/ams98-6.htm, (accessed March 10, 1998).
Scott, David W. Multivariate Density Estimation; John Wiley & Sons, Inc. :NewYork, 1992.
Shuttleworth, W. James, Evaporation, Handbook of Hydrology, Maidment, DavidR., editor; McGRAW-HILL, Inc.:New York, 4.1–4.53, 1993.
Smith, J. A., G. N. Day and M. D. Kane, Nonparametric Framework for Long-Range Stremflow Forecasting, Journal of Water Resources Planning and Man-agement, 118 (1), 82–91, 1992.
Stensrud, David J. and Matthew S. Wandishin, The Correspondence Ratio inForecast Evaluation, Weather and Forecasting, 15 (10), 593–602, 2000.
125
Stephenson, David B., Use of the “Odds Ratio” for Diagnosing Forecast Skill,Weather and Forecasting, 15 (4), 221–232, 2000.
Wilks, D. S., Diagnostic Verification of the Climate Prediction Center Long-LeadOutlooks, 1995-98, Journal of Climate, 13(7), 2389-2403, 2000.
Wilks, D. S. Statistical Methods in the Atmospheric Sciences; New York: AcedemicPress, 1995.
Wilson, Laurence J., William R. Burrows and Andreas Lanzinger, A Strategyfor Verification of Weather Element Forecasts from an Ensemble PredictionSystem, Monthly Weather Review, 127 (6), 956–970, 1999.
Zhang, H. and T. Casey, Verification of Categorical Probability Forecasts, Weatherand Forecasting, 15 (1), 80–89, 2000.
Zhang, S., R. J. Karunamuni and M. C. Jones, An Improved Estimator of theDensity Function at the Boundary, Journal of the American Statistical Asso-ciation, 94 (448), 1231–1241, 1999.