verification continued… holly c. hartmann
DESCRIPTION
Verification Continued… Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona [email protected]. RFC Verification Workshop, 08/14/2007. Agenda. Introduction to Verification Applications, Rationale, Basic Concepts Data Visualization and Exploration - PowerPoint PPT PresentationTRANSCRIPT
1
Verification Continued…
Holly C. Hartmann
Department of Hydrology and Water ResourcesUniversity of Arizona
RFC Verification Workshop, 08/14/2007
2
1. Introduction to Verification- Applications, Rationale, Basic Concepts- Data Visualization and Exploration- Deterministic Scalar measures
2. Categorical measures – KEVIN WERNER- Deterministic Forecasts- Ensemble Forecasts
3. Diagnostic Verification- Reliability- Discrimination- Conditioning/Structuring Analyses
4. Lab Session/Group Exercise- Developing Verification Strategies- Connecting to Forecast Operations and Users
Agenda
3
Probabilistic Ensemble Forecasts
From: California-Nevada River Forecast Center
4
Probabilistic Ensemble Forecasts
From: California-Nevada River Forecast Center
5
Probabilistic Ensemble Forecasts
From: A. Hamlet, University of Washington
6
From: A. Hamlet, University of Washington
7
Probabilistic Ensemble Forecasts
From: A. Hamlet, University of Washington
8
• Identifies systematic flaws of an ensemble prediction system.• Shows effectiveness of ensemble distribution in sampling the observations. • Does not indicate that the ensemble will be of practical use.
Talagrand Diagram – Also Called Ranked Histogram
9
With only one ensemble member ( | ) all (100%) observations ( ) will fall “outside”
With two ensemble members two out ofthree observations ( 2/3=67%) should fall outside
With three ensemble members two out of four observations ( 2/4=50%) should fall outside
|
| |
| | | For any number of ensemble members, 2/#members should fall outside the ensemble
• Identifies systematic flaws of an ensemble prediction system.• Shows effectiveness of ensemble distribution in sampling the observations. • Does not indicate that the ensemble will be of practical use.
Principle Behind Talagrand Diagram
Talagrand Diagram – Also Called Ranked Histogram
Adapted from A. Persson, 2006
10
Talagrand Diagram Computation Example
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Four sample ensemble members (E1 – E4) for daily flow forecasts (produced from reforecasts using carryover each year)
Step 1: Rank lowest to highest for each year. Four members results in 5 bins.
Step 2: Determine which bin the corresponding observation falls into.
Step 3: Tally how many observations fall in each bin.
Step 4: Plot frequency of observations for ranked bin.
Bin #
5
3
5
4
5
3
4
4
5
Bin1 Bin2 Bin3 Bin4 Bin5
Bin # Tally
1 2 3 4 5
11
Talagrand Diagram Computation Example
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Four sample ensemble members (E1 – E4) for daily flow forecasts (produced from reforecasts using carryover each year)
Step 1: Rank lowest to highest for each year. Four members results in 5 bins.
Step 2: Determine which bin the corresponding observation falls into.
Step 3: Tally how many observations fall in each bin.
Step 4: Plot frequency of observations for ranked bin.
Bin #
5
3
5
4
3
1
2
5
3
4
4
5
Bin1 Bin2 Bin3 Bin4 Bin5
Bin # Tally
1 1 2 1 3 3 4 3 5 4
12
Talagrand Diagram
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5
Categor y Defi ned by Or der ed E nsembl e Member s
Talagrand Diagram Computation Example
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Four sample ensemble members (E1 – E4) ranked lowest to highest for daily flow (produced from reforecasts using carryover in each year)
Bin #
5
3
5
4
3
1
2
5
3
4
4
5
Bin # Tally
1 1 2 1 3 3 4 3 5 4
Bin1 Bin2 Bin3 Bin4 Bin5
Fre
qu
ency
13
Talagrand Diagram: 25 traces/ensemble, 375 observations
Example: “U-Shaped”Observations too often falling outside ensembleIndicates ensemble spread too small
Example: “L-Shaped”Observations too often larger (smaller) than ensembleIndicates under- (over-) forecasting bias
Example: “N-Shaped” (domed shaped)Observations too rarely falling outside ensembleIndicates ensemble spread is too big
Example: “Flat-Shaped”Observations falling uniformly across ensembleIndicates appropriately sized ensemble distribution
0
5
10
15
20
25
1 3 5 7 9
11
13
15
17
19
21
23
25
Category(defined by ordered ensemble members)
Re
lati
ve
Fre
qu
en
cy
of
An
aly
sis
0
24
6
810
12
14
1618
20
1 3 5 7 9
11
13
15
17
19
21
23
25
Category(defined by ordered ensemble members)
Rel
ativ
e Fr
equ
enc
y o
f Ana
lys
is
0
2
4
6
8
10
12
14
16
18
1 3 5 7 9
11
13
15
17
19
21
23
25
Category(defined by ordered ensemble members)
Re
lati
ve
Fre
qu
en
cy
of
An
aly
sis
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Category(defined by ordered ensemble members)
Re
lati
ve F
req
ue
nc
y o
f A
na
lys
is
14
Talagrand Diagram Example: Interpretation?
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Bin #
5
3
5
4
3
1
2
5
3
4
4
5
Bin # Tally
1 1 2 1 3 3 4 3 5 4
Bin1 Bin2 Bin3 Bin4 Bin5
Four sample ensemble members (E1 – E4) ranked lowest to highest for daily flow (produced from reforecasts using carryover in each year)
Talagrand Diagram
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5
Categor y Defi ned by Or der ed E nsembl e Member s
???
Fre
qu
ency
15
Distributions-orientedForecast Evaluation
leads toDiagnostic Verification
It’s all about conditional and marginal distributions!
P(O|F), P(F|O), P(F), P(O)
Reliability, Discrimination, Sharpness, Uncertainty
16
Forecast Reliability -- P(O|F)
For a specified forecast condition, what does the distribution of observations look like?
Forecasted Probability
Rel
ativ
e fr
equ
ency
of
obse
rved
0
1
01
Forecasted Probability
Rel
ativ
e fr
equ
ency
of
obse
rved
0
1
01
User perspective: “When you say 20% chance of flood flows, how often do flood flows actually happen?”
User perspective: “When you say 80% chance of flood flows, how often do flood flows actually happen?”
17
Good reliability – close to diagonal
Sharpness diagram (p(f)) –histogram of forecasts in each probability bin shows shows marginal distribution of forecasts
The reliability diagram is conditioned on the forecasts. That is, given that X was predicted, what was the outcome?
Reliability (Attributes) Diagram – Reliability, Sharpness
18
Reliability Diagram Example Computation
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Step 1: Choose threshold value to base probability forecasts on. For simplicity we’ll choose the mean forecast over all years and all ensembles (= 208).
19
Reliability Diagram Example Computation
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Step 2: Choose how many forecast probability categories to use (5 here: 0,.25,.5,.75,1)
Step 3: For each forecast, calculate the forecast probability below the threshold value.
P(peakfor < 208)
1.0
0.5
0.5
0.0
0.25
0.25
1.0
0.5
0.5
1.0
20
Reliability Diagram Example Computation
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Step 2: Choose how many forecast probability categories to use (5 here: 0,.25,.5,.75,1)
Step 3: For each forecast, calculate the forecast probability below the threshold value.
P(peakfor < 208)
1.0
0.5
0.5
0.0
0.25
0.25
1.0
0.5
0.25
0.5
0.5
1.0
21
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Step 4: Group the observations into groups of equal forecast probability (or, more generally, into forecast probability categories).
P(peakfor < 208)
1.0
0.5
0.5
0.0
0.25
0.25
1.0
0.5
0.25
0.5
0.5
1.0
P(peak < 208) = 0.0
516P(peak < 208) = 0.25
348, 98, 233P(peak < 208) = 0.5
206, 301, 245, 248, 227P(peak < 208) = 0.75
N/A
P(peak < 208) = 0.0
516P(peak < 208) = 0.25
348, 98, 233P(peak < 208) = 0.5
206, 301, 245, 248, 227
Reliability Diagram Example Computation
P(peak < 208) = 1.0
22
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Step 4: Group the observations into groups of equal forecast probability (or, more generally, into forecast probability categories).
P(peakfor < 208)
1.0
0.5
0.5
0.0
0.25
0.25
1.0
0.5
0.25
0.5
0.5
1.0
P(peak < 208) = 0.0
516P(peak < 208) = 0.25
348, 98, 233P(peak < 208) = 0.5
206, 301, 245, 248, 227P(peak < 208) = 0.75
N/A
P(peak < 208) = 0.0
516P(peak < 208) = 0.25
348, 98, 233P(peak < 208) = 0.5
206, 301, 245, 248, 227
P(peak < 208) = 1.0
112, 156, 167
Reliability Diagram Example Computation
23
Step 5: For each group, calculate the frequency of observations above the threshold value, 208 cfs.
P(peak < 208) = 0.0
516P(peak < 208) = 0.25
348, 98, 233P(peak < 208) = 0.5
206, 301, 245, 248, 227P(peak < 208) = 0.75
N/AP(peak < 208) = 1.0
112, 156, 167
P(obs peak < 208 given [P(peakfor < 208) = 0.0]) = 0/1 = 0.0
P(obs peak < 208 given [P(peakfor < 208) = 0.25]) = 1/3 = 0.33
P(obs peak < 208 given [P(peakfor < 208) = 0.5]) = 1/5 = 0.2
P(obs peak < 208 given [P(peakfor < 208) = 1.0]) =
P(obs peak < 208 given [P(peakfor < 208) = 0.75]) = 0/0 = NA
Reliability Diagram Example Computation
24
Step 5: For each group, calculate the frequency of observations above the threshold value, 208 cfs.
P(peak < 208) = 0.0
516P(peak < 208) = 0.25
348, 98, 233P(peak < 208) = 0.5
206, 301, 245, 248, 227P(peak < 208) = 0.75
N/AP(peak < 208) = 1.0
112, 156, 167
P(obs peak < 208 given [P(peakfor < 208) = 0.0]) = 0/1 = 0.0
P(obs peak < 208 given [P(peakfor < 208) = 0.25]) = 1/3 = 0.33
P(obs peak < 208 given [P(peakfor < 208) = 0.5]) = 1/5 = 0.2
P(obs peak < 208 given [P(peakfor < 208) = 1.0]) = 3/3 = 1
P(obs peak < 208 given [P(peakfor < 208) = 0.75]) = 0/0 = NA
Reliability Diagram Example Computation
25
Step 6: Plot centroid of the forecast category (just points in our case) on the x-axis against the observed frequency within each forecast category on the y-axis. Include the 45 degree diagonal for reference.
Reliability Diagram Example Computation
26
Step 7: Include sharpness plot showing the number of observation/forecast pairs in each category.
Reliability Diagram Example Computation
27
Good reliability – close to diagonal
Sharpness diagram (p(f)) –histogram of forecasts in each probability bin shows marginal distribution of forecasts
Good resolution –wide range of frequency of observations corresponding to forecast probabilities
Skill – related to Brier Skill Score, in reference to sample climatology (not historical climatology)
The reliability diagram is conditioned on the forecasts. That is, given that X was predicted, what was the outcome?
Reliability Diagram – Reliability, Sharpness – P(O|F)
28
Overall relative frequency of observations
(sample climatology)
Points closer to perfect-reliability line than to no-resolution line: subsamples of probabilistic forecast contribute positively to overall skill (as defined by BSS) in reference to sample climatology
No-skill line : halfway between perfect-reliability line and no-resolution line, with sample climatology as a reference
Attributes Diagram – Reliability, Resolution, Skill/No-skill
29
Climatology Minimal RESolution Underforecasting
Good RES, at expense of REL
Reliable forecasts of rare event
Small sample size
Source: Wilks (1995)
Interpretation of Reliability DiagramsInterpretation of Reliability Diagrams
30
Interpretation of Reliability DiagramsInterpretation of Reliability Diagrams
Reliability
P[O|F]
Does the frequency of occurrence match your probability statement?
Identifies conditional biasR
elat
ive
freq
uenc
y of
obs
erva
tions
Forecasted probability
No resolution
31
EVS Reliability Diagram Examples
25th Percentile Observed Flows (low flows)
Sharp forecasts, but low resolution
Arkansas-Red Basin, 24-hr flows, lead time 1-14 days
85th Percentile Observed Flows (high flows)
Good reliability at shorter lead times, long-leads miss high events
From: J. Brown, EVS Manual
32
Historical seasonal water supply outlooks
Colorado River Basin
Morrill, Hartmann, and Bales, 2007
33
Forecast probability
Rela
tive F
requency
of
Obse
rvati
ons
Jan 1
2) These months show best reliability; low resolution limiting reliability
1) Few high prob. fcasts, good reliability between 10-70% probability; reliability improves.
Reliability: Colorado Basin ESP Seasonal Supply OutlooksReliability: Colorado Basin ESP Seasonal Supply Outlooks
Apr 1
Mar 1
Jun 1
Jan 1
Apr 1
LC JM (5 mo. lead)
LC MM (3 mo. lead)
LC AM (2 mo. lead)
UC JJy (7 mo. lead)
UC AJy (4 mo. lead)
UC JnJy (2 mo. lead)
3) Reliability decreases for later forecasts as resolution increases; UC good at extremes.
high 30% mid 40% low 30%
Franz, Hartmann, and Sorooshian, 2003
34
For a specified observation category, what do the forecast distributions look like?
Discrimination – P(F|O)
“When dry conditions happen… What do the forecasts usually look like?
You sure hope that forecasts look different when there’s a drought, compared to when there’s a flood!
35
Discrimination – P(F|O)
You sure hope that forecasts look different when there’s a drought, compared to when there’s a flood!
Example: NWS CPC Seasonal climate outlooks, sorted into DRY cases (lowest tercile), 1995-2001, all forecasts, all lead-times
Good discrimination! Not much discrimination!
Forecasted Probability
Rel
ativ
e fr
eque
ncy
of
ind
icat
ed f
orec
ast
Climatology
0.00 0.33 1.00
Probability of dry
Probability of wet
Forecasted Probability
Rel
ativ
e fr
eque
ncy
of
ind
icat
ed f
orec
ast
Climatology
0.00 0.33 1.00
Probability of dry
Probability of wet
36
Rela
tive F
requency
of
Fore
cast
sHigh
Mid-
Low
There is some discrimination…
Early forecasts warned “High flows less likely”
Jan 1
Jan-May
When unusually low flows happened…
P(F|Low flows). Low < 30th percentile
Franz, Hartmann, and Sorooshian (2003)
Forecast probability
Discrimination: Lower Colorado ESP Supply Outlooks
37
Rela
tive F
requency
of
Fore
cast
s
Good Discrimination…
Forecasts were saying:
1) high and mid- flows less likely.
2) Low flows more likely
Jan 1
Forecast probability
Apr 1
Jan-May
Apr-May
High
Mid-
Low
There is some discrimination…
Early forecasts warned “High flows less likely”
Discrimination: Lower Colorado ESP Supply Outlooks
When unusually low flows happened…
P(F|Low flows). Low < 30th percentile
Franz, Hartmann, and Sorooshian (2003)
38
Rela
tive F
requency
of
Fore
cast
s
high 30%
mid 40%
low 30%
1)High flows less likely.
2) No discrimination between mid and low flows.3) Both UC and LC show good discrimination for low flows at 2-month lead time.
Jan 1
Forecast probability
Apr 1
Lower Colorado Basin Jan-May (5 mo. lead)
April-May (2 mo. lead)
Jan 1
Jun 1
Upper Colorado Basin Jan-July (7 mo. lead)
June-July (2 mo. lead)
For observed flows in lowest 30% of historic distribution
Discrimination: Colorado Basin ESP Supply Outlooks
Franz, Hartmann, and Sorooshian (2003)
39
Historical seasonal water supply outlooks
Colorado River Basin
40
All observation CDF is plotted and color coded by tercile.
Forecast ensemble members are sorted into 3 groups according to which tercile its associated observation falls into.
The CDF for each group is plotted in the appropriate color. i.e. high is blue.
Discrimination: CDF Perspective
Credit: K. Werner
41
In this case, there is relatively good discrimination since the three conditional forecast CDFs separate themselves.
Discrimination
Credit: K. Werner
42
Discrimination Example Computation
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Step 1: Order observations and divide ordered list into categories. Here we will use terciles (≤ 167, 206 ≤ ≤ 245, ≥ 248) .
OBS Tercile
Low
Middle
High
High
High
Low
Low
Middle
Middle
High
Middle
LowCredit: K. Werner
43
Discrimination Example Computation
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Step 2: Group forecast ensemble members according to OBS tercile.
Low OBS Forecasts:42, 74, 82, 90, 114, 277, 351, 356,98, 170, 204, 205, 94,135, 156, 158
OBS Tercile
Low
Middle
High
High
High
Low
Low
Middle
Middle
High
Middle
LowCredit: K. Werner
44
Discrimination Example Computation
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Mid OBS Forecasts:65, 143, 223, 227, 69, 169, 229, 236, 94, 219, 267, 270,
OBS Tercile
Low
Middle
High
High
High
Low
Low
Middle
Middle
High
Middle
LowCredit: K. Werner
Step 2: Group forecast ensemble members according to OBS tercile.
45
Discrimination Example Computation
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Mid OBS Forecasts:65, 143, 223, 227, 69, 169, 229, 236, 94, 219, 267, 270, 108, 189, 227, 228
OBS Tercile
Low
Middle
High
High
High
Low
Low
Middle
Middle
High
Middle
LowCredit: K. Werner
Step 2: Group forecast ensemble members according to OBS tercile.
46
Discrimination Example Computation
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Hi OBS Forecasts:82, 192, 295, 300,
142, 291, 349, 356, 59, 175, 244, 250
OBS Tercile
Low
Middle
High
High
High
Low
Low
Middle
Middle
High
Middle
LowCredit: K. Werner
Step 2: Group forecast ensemble members according to OBS tercile.
47
Discrimination Example Computation
YEAR E1 E2 E3 E4
1981 42 74 82 90
1982 65 143 223 227
1983 82 192 295 300
1984 211 397 514 544
1985 142 291 349 356
1986 114 277 351 356
1987 98 170 204 205
1988 69 169 229 236
1989 94 219 267 270
1990 59 175 244 250
1991 108 189 227 228
1992 94 135 156 158
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Hi OBS Forecasts:82, 192, 295, 300, 211, 397, 514, 544142, 291, 349, 356, 59, 175, 244, 250
OBS Tercile
Low
Middle
High
High
High
Low
Low
Middle
Middle
High
Middle
LowCredit: K. Werner
Step 2: Group forecast ensemble members according to OBS tercile.
48
Discrimination Example Computation
OBS
112
206
301
516
348
98
156
245
233
248
227
167
Step 3: Plot all-observation CDF color coded by tercile (≤ 167, 206 ≤ ≤ 245, ≥ 248).
Credit: K. Werner
OBS Tercile
Low
Middle
High
High
High
Low
Low
Middle
Middle
High
Middle
Low
49
Step 4: Add forecasts conditioned on observed terciles CDFs to plot.
Low OBS Forecasts:42, 74, 82, 90, 114, 277, 351, 356,98, 170, 204, 205, 94, 135, 156, 158Mid OBS Forecasts:65, 143, 223, 227,69, 169, 229, 236,94, 219, 267, 270,108, 189, 227, 228Hi OBS Forecasts:82, 192, 295, 300,211, 397, 514, 544,142, 291, 349, 356,59, 175, 244, 250
Discrimination Example Computation
Credit: K. Werner
50
Step 5: Discrimination is shown by the degree to which the conditional forecast CDFs are separated from each other.
In this case, high forecasts discriminate better than mid and low forecasts.
Discrimination Example Computation
Credit: K. Werner
51
How well do April – July volume forecasts discriminate when they are made in Jan, Mar, and May?
Poor discrimination in Jan between forecasting high and medium flows. Best discrimination in May.
Discrimination
Credit: K. Werner
52
Another way to look at discrimination using PDF’s in lieu of CDF’s.
The more separation between the PDF’s the better the discrimination.
Discrimination
Credit: K. Werner
53
Deterministic forecasts
• traditional in hydrology
• sub-optimal for decision making
Common perspective
“Deterministic model simulations and probabilistic forecasts … are two entirely different types of products. Direct comparison of probabilistic forecasts with deterministic single valued forecasts is extremely difficult”
Comparing Deterministic & Probabilistic ForecastsComparing Deterministic & Probabilistic Forecasts
- Anonymous
54
How can we compare deterministic and probabilistic forecasts?
Deterministic
Probabilistic
Source: XEFS Design Team, 2007
Option: Use ensemble median with standard metrics – No! x
55From: A. Hamlet, University of Washington
The ensemble mean minimizes error, but doesn’t represent the overall behavior.
“Pretend Determinism”
56
What’s wrong with using ‘deterministic’ metrics?
Metrics using only central tendency of each forecast pdf fail to distinguish between forecasts 1-3, but will identify 4 as inferior.Metrics that reward accuracy but punish spread will rank the forecast skill from 1 to 4.
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 0.5 1 1.5 2 2.5
Obs Value
3
21
4
From: A. Hamlet, University of Washington
57
How can we compare deterministic and probabilistic forecasts?
Deterministic
Probabilistic
Source: XEFS Design Team, 2007
Option: Use ensemble median with standard metrics – No! x
58
PD
F
Climatologydistribution
Forecast distribution
Tercile boundaries(equal probability) Deterministic
forecast
Jack-knife calibration error =PDF of error distribution
can determine any quantiles
Deterministic vs. Probabilistic Forecasts
Observation
Flow, QApproach used by Morrill, Hartmann, Bales 2007
59
Lab Session -- Group Exercise
Choose a set of forecasts.
Develop strategies for verifying these forecasts from two perspectives:
- Users
- Forecasters during operations
Report back to group.
Repeat for second set of forecasts, if time permits.