julie demargne, james brown, yuqiong liu and d-j seo

Application of Forecast Verification Application of Forecast Verification Science to Operational River Forecasting Science to Operational River Forecasting

in the National Weather Servicein the National Weather Service

Julie Demargne, Julie Demargne, James Brown,James Brown,

Yuqiong Liu and D-J SeoYuqiong Liu and D-J Seo

NROW, November 4-5, 2009UCAR

2

Approach to river forecastingApproach to river forecasting

INTERFLOWSURFACERUNOFF

INFILTRATIONTENSION

TENSION TENSION

PERCOLATION

LOWERZONE

UPPERZONE

PRIM ARYFREE

SUPPLE-MENTAL

FREE

RESERVED RESERVED

FREE

EVAPOTRANSPIRATION

BASEFLOW

SUBSURFACEOUTFLOW

DIRECTRUNOFF

Observations

Models

Input forecasts

Forecastproducts

Users

Forecasters

Forecasters

3

Where is the …?Where is the …?VerificationVerification

???

In the past

• Limited verification of hydrologic forecasts

• How good are the forecasts for application X?

4

Where is the …?Where is the …?

!!!

Verification Experts

Verification Systems

Verification Products

Papers

NowVerificationVerification

5

Hydrologic forecasting: a multi-scale problemHydrologic forecasting: a multi-scale problem

National Major river system River basin with river forecast points

Forecast group Headwater basin with radar rainfall grid

High resolution flashflood basins

Hydrologic forecasts must be verified consistently across all spatial scales and resolutions.

6

Hydrologic forecasting: a multi-scale problemHydrologic forecasting: a multi-scale problem

Seamless probabilistic water forecasts are required for all lead times and all users; so is verification information.

Benefits

Forecast Lead Time

Protection of Life & Property

Hydropower Recreation Ecosystem State/Local Planning

Environment

Flood Mitigation & Navigation

Agriculture Health Commerce Reservoir Control

Forecast Uncertainty

Weeks

Months

Seasons Years

Days

Hours

Minutes

7

Need for hydrologic forecast verificationNeed for hydrologic forecast verification

• In 2006, NRC recommended NWS expand verification of its uncertainty products and make it easily available to all users in near real time

Users decide whether to take action with risk-based decisionMust educate users on how to interpret forecast and verification info

8

River forecast verification serviceRiver forecast verification service

http://www.nws.noaa.gov/oh/rfcdev/docs/NWS-Hydrologic-Forecast-Verification-

Team_Final-report_Sep09.pdf.pdf

http://www.nws.noaa.gov/oh/ rfcdev/docs/ Final_Verification_Report.pdf

9


• To help us answer How good are the forecasts for application X? What are the strengths and weaknesses of the forecasts? What are the sources of error and uncertainty in the

forecasts? How are new science and technology improving the

forecasts and the verifying observations? What should be done to improve the forecasts? Do forecasts help users in their decision making?

10


INTERFLOWSURFACERUNOFF

INFILTRATIONTENSION

TENSION TENSION

PERCOLATION

LOWERZONE

UPPERZONE

PRIM ARYFREE

SUPPLE-MENTAL

FREE

RESERVED RESERVED

FREE

EVAPOTRANSPIRATION

BASEFLOW

SUBSURFACEOUTFLOW

DIRECTRUNOFF

Observations

Models

Input forecasts

Forecastproducts

Verification systems

Verification products

River forecasting system

Users

UsersForecasters

11


• Verification Service within Community Hydrologic Prediction System (CHPS) to:

Compute metrics

Display data & metrics

Disseminate data & metrics

Provide real-time access to metrics

Analyze uncertainty and error in forecasts

Track performance

12

Verification challengesVerification challenges• Verification is useful if the information generated leads to

decisions about the forecast/system being verified Verification needs to be user oriented

• No single verification measure provides complete information about the quality of a forecast product Several verification metrics and products are needed

• To facilitate communication of forecast quality, common verification practices and products are needed from weather and climate forecasts to water forecasts Collaborations between meteorology and hydrology communities

are needed (e.g., Thorpex-Hydro, HEPEX)

13

Verification challenges: two classes of Verification challenges: two classes of verificationverification

• Diagnostic verification: to diagnose and improve model performance done off-line with archived forecasts or hindcasts to analyze

forecast quality relative to different conditions/processes

• Real-time verification: to help forecasters and users make decisions in real-time done in real-time (before the verifying observation occurs)

using information from historical analogs and/or past forecasts and verifying observations under similar conditions

14

Diagnostic verification productsDiagnostic verification products

• Key verification metrics for 4 levels of information for single-valued and probabilistic forecasts

1. Observations-forecasts comparisons (scatter plots, box plots, time series plots)

2. Summary verification (e.g. MAE/Mean CRPS, skill score)

3. More detailed verification (e.g. measures of reliability, resolution, discrimination, correlation, results for specific conditions)

4. Sophisticated verification (e.g. for specific events with ROC)

To be evaluated by forecasters and forecast users

15

For

ecas

t va

lue

Observed value

User-defined threshold

Diagnostic verification productsDiagnostic verification products• Examples for level 1: scatter plot, box-and-whiskers plot

16

Diagnostic verification productsDiagnostic verification products

Zero error line

Observed daily total precipitation [mm]

Low biasHigh bias

“Blown” forecasts

American River in California – 24-hr precipitation ensembles (lead day 1)F

ore

ca

st

err

or

(fo

rec

as

t -

ob

se

rve

d)

[mm

]‘Errors’ forone forecast

Max.

90%

80%

Median

20%10%

Min.

• Examples for level 1: box-and-whiskers plot

17

Diagnostic verification productsDiagnostic verification products• Examples for level 2: skill score maps by months

January April October

Smaller score, better

18

Diagnostic verification productsDiagnostic verification products• Examples for level 3: more detailed plots

ScoreScore

Performance under different conditions

Performance for different months

19

Diagnostic verification productsDiagnostic verification products• Examples for level 4: event specific plots

Probability of False Detection POFD

Pro

bab

ility

of

Det

ecti

on

PO

D

Event: > 85th percentile from observed distribution

Predicted Probability

Ob

serv

ed f

req

uen

cy

Perfect

PerfectReliability Discrimination

20

Diagnostic verification productsDiagnostic verification products• Examples for level 4: user-friendly spread-bias plot

60% of time, observation should fall in window covering middle 60% (i.e. median ±30%)

“Underspread”

“Hit rate” = 90%

60% Perfect

21

Diagnostic verification analysesDiagnostic verification analyses• Analyze any new forecast process with verification

• Use different temporal aggregations Analyze verification statistic as a function of lead time

If similar performance across lead times, data can be pooled

• Perform spatial aggregation carefully Analyze results for each basin and results plotted on spatial maps

Use normalized metrics (e.g. skill scores)

Aggregate verification results across basins with similar hydrologic processes (e.g. by response time)

• Report verification scores with sample size In the future, confidence intervals

22

Diagnostic verification analysesDiagnostic verification analyses• Evaluate forecast performance under different conditions

w/ time conditioning: by month, by season

w/ atmospheric/hydrologic conditioning: – low/high probability threshold

– absolute thresholds (e.g., PoP, Flood Stage)

Check that sample size is not too small

• Analyze sources of uncertainty and errorVerify forcing input forecasts and output forecasts

For extreme events, verify both stage and flow

Sensitivity analysis to be set up at all RFCs:

1) what is the optimized QPF horizon for hydrologic forecasts?

2) do run-time modifications made on the fly improve forecasts?

23

Diagnostic verification softwareDiagnostic verification software• Interactive Verification Program (IVP) developed at OHD:

verifies single-valued forecasts at given locations/areas

24

Diagnostic verification softwareDiagnostic verification software• Ensemble Verification System (EVS) developed at OHD:

verifies ensemble forecasts at given locations/areas

25

Dissemination of diagnostic verificationDissemination of diagnostic verification• Example: WR water supply website

http://www.nwrfc.noaa.gov/westernwater/

Data Visualization

Error•MAE, RMSE•Conditional on lead time, year

Skill•Skill relative to Climatology•Conditional

Categorical•FAR, POD, contingency table (based on climatology or user definable)

26

http://www.erh.noaa.gov/ohrfc/bubbles.php

Dissemination of diagnostic verificationDissemination of diagnostic verification• Example: OHRFC bubble plot online

27

Real-time verificationReal-time verification• How good could the ‘live’ forecast be?

Live forecast

Observations

28

• Select analogs from a pre-defined set of historical events and compare with ‘live’ forecast

Real-time verificationReal-time verification

Analog 1

ObservedLive forecast

Analog ForecastAnalog Observed

Analog 2Analog 3

“Live forecast for Flood is likely to be too high”

29

Real-time verificationReal-time verification

What happened

Live forecast

• Adjust ‘live’ forecast based on info from the historical analogs

Live forecast was too high

30

Real-time verificationReal-time verification• Example for ensemble forecasts

Tem

pera

ture

(o F)

Forecast lead day

Live forecast (L)

Analog observations

Analog forecasts (H): μH = μL ± 1.0˚C

“Day 1 forecast is probably too high”

31

Real-time verificationReal-time verification• Build analog query prototype using multiple criteria

Seeking analogs for precipitation: “Give me past forecasts for the 10 largest events relative to hurricanes for this basin.”

Seeking analogs for temperature: “Give me all past forecasts with lead time 12 hours whose ensemble mean was within 5% of the live ensemble mean.”

Seeking analogs for flow: “Give me all past forecasts with lead times of 12-48 hours whose probability of flooding was >=0.95, where the basin-averaged soil-moisture was > x and the immediately prior observed flow exceeded y at the forecast issue time”.

Requires forecasters’ input!

32

Outstanding science issuesOutstanding science issues• Define meaningful reference forecasts for skill scores

• Separate timing error and amplitude error in forecasts

• Verify rare events and specify sampling uncertainty in metrics

• Analyze sources of uncertainty and error in forecasts

• Consistently verify forecasts on multiple space and time scales

• Verify multivariate forecasts (issued at multiple locations and for multiple time steps) by accounting for statistical dependencies

• Account for observational error (measurement and representativeness errors) and rating curve error

• Account for non-stationarity (e.g., climate change)

33

Verification service developmentVerification service development

OHD-NCEPThorpex-Hydro project

OHDOCWWS

NCEP

Forecasters Users

Academia Forecast agencies

Private HEPEX Verification Test Bed (CMC, Hydro-Quebec, ECMWF)

OHD-Deltares collaboration for CHPS enhancements

COMET-OHD-OCWWS collaboration on training

34

Looking aheadLooking ahead

• 2012: Info on quality of forecast service

available online

real-time and diagnostic verification implemented in CHPS

RFC verification standard products available online along with forecasts

• 2015: Leveraging grid-based verification

tools

FUTURE

35

[email protected]@noaa.gov

FORECASTER FORECASTER

Thank youThank you

Questions?Questions?

36

Extra slideExtra slide

37

Diagnostic verification productsDiagnostic verification products• Key verification metrics from NWS Verification Team report

julie demargne, james brown, yuqiong liu and d-j seo

Documents

verification information

common verification

forecast quality relative

climate forecasts

archived forecasts

operational river forecasting

uncertainty products

metricsanalyze uncertainty