1 verification of nowcasts and very short range forecasts beth ebert bmrc, australia wwrp int'l...

1

Verification of nowcasts and very short range forecasts

Beth Ebert

BMRC, Australia

WWRP Int'l Symposium on Nowcasting and Very Short Range Forecasting, Toulouse, 5-9 Sept 2005

2

Why verify forecasts?

• To monitor performance over time summary scores

• To evaluate and compare forecast continuous and systems categorical scores

• To show impact of forecast skill & value scores

• To understand error in order to diagnostic methods improve forecast system

The verification approach taken depends on the purpose of the verification

3

Verifying nowcasts and very short range forecasts

Nowcast characteristics Impact on verification

concerned mainly with high impact weather

rare events difficult to verify in systematic manner

may detect severe weather elementsstorm spotter observations &

damage surveys required

observations-based same observations often used to

verify nowcasts

high temporal frequency many nowcasts to verify

high spatial resolutionobservation network usually not

dense enough (except radar)

small spatial domainrelatively small number of standard

observations

4

Observations – issues for nowcasts

Thunderstorms and severe weather (mesocyclones, hail, lightning, damaging winds)

• Spotter observations may contain error• Biased observations

• More observations during daytime & in populated areas• More storm reports when warnings were in effect

• Cell mis-association by cell tracking algorithms

Precipitation• Radar rain rates contain error• Scale mismatch between gauge observations and radar pixels

Observation error can be large but is usually neglected

more research required on handling observation error

5

Matching forecasts and observations

• Matching approach depends on• Nature of forecasts and observations

• Scale• Consistency• Sparseness

• Other matching criteria• Verification goals• Use of forecasts

• Matching approach can impact verification results

• Grid to grid approach• Overlay forecast and observed grids• Match each forecast and observation

Forecast grid

Observed grid

point-to-grid grid-to-point

1 – forecast and observed almost perfect overlap.

2 – majority of observed and forecast echoes overlap or offsets <50 km

3 – forecast and observed look similar but there are a number of echo offsets and several areas maybe missing or extra.

4 – the forecasts and observed are significantly different with very little overlap; but some features are suggestive of what actually occurred.

5 – there is no resemblance to forecast and observed.

Forecast Quality DefinitionsWilson subjective categories

First rule of forecast verification – look at the results!

7

Systematic verification – many cases Aggregation and stratification

• Aggregation• More samples more robust statistics• Across time - results for each point in space• Space - results for each time• Space and time - results summarized across spatial region

and across time

• Stratification• Homogeneous subsamples better understanding of how

errors depend on regime• By location or region• By time period (diurnal or seasonal variation)

8

Real-time nowcast verification

• Rapid feedback from latest radar scan

• Evaluate the latest objective guidance while it is still "fresh"

• Better understand strengths and weaknesses of nowcast system

• Tends to be subjective in nature

• Not commonly performed!

Real time forecast verification system (RTFV) under development in BMRC

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

0 10 20 30 40 50 60 70

Rain (mm/h)

Pro

babi

lity

of e

xcee

danc

e

radar anal LAPS05

9

Post-event verification

• More observations may be available• verification results more robust

• No single measure is adequate!• several metrics needed• distributions-oriented verification

• scatter plots• (multi-category) contingency tables• box-whisker plots

• Confidence intervals recommended, especially when comparing one set of results with another

• Bootstrap (resampling) method simple to apply

Frequencybias

POD

FAR

CSI

10

Accuracy – categorical verification

Standard categorical verification scoresPC = (H + CR) / N proportion correct (accuracy)

Bias = (F + H) / (M + H) frequency bias

POD = H / (H + M) probability of detection

POFD = F / (CR + F) probability of false detection

FAR = F / (H + F) false alarm ratio

CSI = H / (H + M + F) critical success index (threat score)

ETS = (H – Hrandom) / (H + M + F – Hrandom) equitable threat score

HSS = (H + CR – PCrandom) / (N – PCrandom) Heidke skill score

HK = POD – POFD Hanssen and Kuipers discriminant

OR = (H * CR) / (F * M) odds ratio

Estimated yes no

yes H = hits M = misses

no F = false CR = correct alarms rejections

Obs

erve

d

forecast

observationsH FM

CR

11

Standard continuous verification scores(scores computed over entire domain)

bias = mean error

MAE = mean absolute error

RMSE = root mean square error

r = correlation coefficient

Accuracy – continuous verification

ForecastF

ObservationsO Domain

OF

OF

21)OF(

N

22 )OO()FF(

)OO)(FF(

12

Standard probabilistic verification scores/methods

Reliability diagram

Brier score

Brier skill score

Ranked probability score

Accuracy – probabilistic verification

Relative operating characteristic (ROC)

2

1

1 )op(

NBS i

N

ii

referenceBS

BSBSS 1

2

11

1)CDFCDF(

MRPS

M

mm,obsm,fcst

13

A forecast has skill if it is more accurate than a reference forecast (usually persistence, cell extrap-olation, or random chance).

Skill scores measure the relative improvement of the forecast over the reference forecast:

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

30 60 90 120 150 180Forecast (min)

Han

ssen

& K

uip

ers

sco

re

____ Nowcast _ _ _ Extrapolation........ Gauge persistence

> 0 mm

> 1 mm

> 5 mm

Skill

forecastreferenceperfect

forecastreferenceforecast

scorescore

scorescoreskill

Strategy 1: Plot the performance of the forecast system and the unskilled reference on the same diagram

30 60 90 120 150 180-1

-0.8

-0.6

-0.4

-0.2

0

Forecast (min)

Ski

ll w

.r.t

. gau

ge

per

sis.

____ Nowcast _ _ _ ExtrapolationStrategy 2: Plot the value of the skill score

14

Practically perfect hindcast – upper bound on accuracy

Approach: If the forecaster had all of the observations in advance, what would the "practically perfect" forecast look like?

• Apply a smoothing function to the observations to get probability contours, choose an appropriate yes/no threshold

• Did the actual forecast look like the practically perfect forecast?

• How did the performance of the actual forecast compare to the performance of the practically perfect forecast?

SPC convective outlook CSI = 0.34 Practically perfect hindcast CSI = 0.48

Kay and Brooks, 2000

Convective outlook was 75% of the way to being "practically perfect"

15

"Double penalty"

Event predicted where it did not occur, no event predicted where it did occur

Big problem for nowcasts and other high resolution forecasts

Ex: Two rain forecasts giving the same volume

High resolution forecastRMS ~ 4.7POD=0, FAR=1, CSI=0

Low resolution forecastRMS ~ 2.7POD~1, FAR~0.7, CSI~0.3

10 10 103

fcst obs fcst obs

16

ValueA forecast has value if it helps a user make a better decision

Value scores measures the relative economic value of the forecast over some reference forecast:

The most accurate forecast is not always the most valuable!

forecastreferenceperfect

forecastreferenceforecast

expenseexpense

expenseexpensevalue

Baldwin and Kain, 2004

fcstobs

Expense depends on the cost of taking preventative action and the loss incurred for a missed event

Small or rare events with high losses, value maximized by over-prediction

fcstobs

Events with high costs and displacement error likely, value maximized by under-

prediction

17

Exact match vs. "close enough"

Need we get a high resolution forecast exactly right?

Often "close" is still useful to a forecaster

YES

• High stakes situations (e.g. space shuttle launch, hurricane landfall)

• Hydrological applications (e.g. flash floods)

• Topographically influenced weather (valley winds, orographic rain, etc.)

NO

• Guidance for forecasters

• Model validation (does it predict what we expect it to predict?)

• Observations may not allow standard verification of high resolution forecasts

"Fuzzy" verification methods, diagnostic methods

verify attributes of forecast

Standard verification methods appropriate (POD, FAR, CSI, bias, RMSE, correlation, etc.)

18

"Fuzzy" verification methods

• Large forecast and observed variability at high resolution

• Fuzzy verification methods don't require an exact match between forecasts and observations to get a good score

• Vary the size of the space / time neighborhood around a point• Damrath, 2004• Rezacova and Sokol, 2004 *• Theis et al., 2005• Roberts, 2004 *• Germann and Zawadski, 2004

• Also vary magnitude, other elements • Atger, 2001

• Evaluate using categorical, continuous, probabilistic scores / methods * Giving a talk in this Symposium

t

t + 1

t - 1

Forecast value

Fre

qu

en

cy

Sydney

Forecasters don't (shouldn't!) take a high resolution forecast at face value – instead they interpret it in a probabilistic way.

19

Spatial multi-event contingency table

Verify using the Relative Operating Characteristic (ROC)

• Measures how well the forecast can separate events from non-events based on some decision threshold

Decision thresholds to vary:• magnitude (ex: 1 mm h-1 to 20 mm h-1)• distance from point of interest (ex:

within 10 km, .... , within 100 km)• timing (ex: within 1 h, ... , within 12 h)• anything else that may be important in

interpreting the forecast

Can apply to ensembles, and to compare deterministic forecasts to ensemble forecasts

ROC curve for varying rain threshold

Atger, 2001

single threshold

ROC curve for ensemble forecast, varying rain threshold

20

Object- and entity-based verification

• Consistent with human interpretation• Provides diagnostic information on whole-system properties

• Location• Amplitude• Size• Shape

• Techniques• Contiguous Rain Area (CRA) verification (Ebert and McBride, 2000)• NCAR object-oriented approach* (Brown et al., 2004)• Cluster analysis (Marzban and Sandgathe, 2005)• Composite method (Nachamkin, 2004)

AfBf

Cf

Df

AoBo

Co

Do

fcst obs

MM5

8 clusters identified in x-y-p space

NCAR

21

Contiguous Rain Area (CRA) verification

• Define entities using threshold (Contiguous Rain Areas)• Horizontally translate the forecast until a pattern matching

criterion is met:• minimum total squared error • maximum correlation• maximum overlap

• The displacement is the vector difference between the original and final locations of the forecast.

• Compare properties of matched entities • area• mean intensity• max intensity• shape, etc.

Ebert and McBride, 2000

Obs Fcst

22

Error decomposition methods

• Attempt to quantify the causes of the errors • Some approaches:

• CRA verification (Ebert and McBride, 2000)

MSEtotal = MSEdisplacement + MSEvolume + MSEpattern

• Feature calibration and alignment (Nehrkorn et al., 2003)

E(x,y) = Ephase(x,y) + Elocal bias(x,y) + Eresidual(x,y)

• Acuity-fidelity approach (Marshall et al., 2004)

minimize cost function: J = Jdistance + Jtiming + Jintensity + Jmisses

from both perspectives of forecast (fidelity) and observations (acuity)

• Error separation (Ciach and Krajewski, 1999)

MSEforecast = MSEtrue + MSEreference

23

Scale separation methods

• Measure correspondence between forecast and observations at a variety of spatial scales

• Some approaches:

MODEL =1

RADAR =2

RAIN GAUGES =3

SATELLITE =0

• Multiscale statistical prop-erties (Zepeda-Arce et al., 2000; Harris et al., 2001)

• Scale recursive estimation (Tustison et al., 2003)

• Intensity-scale approach* (Casati et al., 2004)

24

Summary

• Nowcasts and very short range forecasts present some unique challenges for verification

• High impact weather• High resolution forecasts• Imperfect observations

• There is still a place for standard scores• Historical reasons• When highly accurate forecasts are required• Useful for monitoring improvement• Must use several metrics• Please quantify uncertainty, especially when intercomparing

forecast schemes• Compare with unskilled forecast such as persistence

25

Summary (cont'd)

• Evolving concept of what makes a "good" forecast• Recognizing value of "close enough"• Probabilistic view of deterministic forecasts

• Exciting new developments of diagnostic methods to better understand the nature and causes of forecast errors

• Object- and entity-based• Error decomposition• Scale separation

26

http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web_page.html

1 verification of nowcasts and very short range forecasts beth ebert bmrc, australia wwrp int'l...

Documents

perfect forecast

reference forecast

forecast system

forecast continuous

forecast echoes

observation forecast

gauge observations

approach overlay forecast