statistical significance, confidence, uncertainty · 2019-04-30 · practical vs. statistical...

Statisticalsignificance,confidence,uncertainty

TressaL.Fowler

AccountingforUncertainty

•Observational•Model

• Modelparameters• Physics• Verificationscores

•Sampling• Verificationstatisticisarealizationofarandomprocess• Whatiftheexperimentwerere-rununderidenticalconditions?Wouldyougetthesameanswer?

Uncertaintyestimatesareamongalonglistofimportantverificationpractices

• Welldefinedquestionsorgoals.• Large,representative,(identical?)sample.• Consistent,independentobservations.• Appropriatemethodsandstatistics.• Uncertaintyestimates.• Spatial,temporal,andconditionaldifferencesevaluated.• Userrelevantresults.• Thoroughlytestedsoftware.

Youcan’tfixbyanalysiswhatyoubungledbydesign.- Light,SingerandWillett.

Definequestion(s)first.Thentheconfidenceintervalisaroundtherightstatistic.

•Whichmodelisbest?•Ismymodelupgradeanimprovement?

•Howfrequentlyareceilingsinthecorrectcategory?

Practicalvs.statisticalsignificance

•Maynotbethesame.Why?•Failuretousesignificantfigures.•Verylargesamplesizes.•Statsassumesindependentsamples,butweatherrarelydelivers.

•Whichdoyouneed?Both!

Twowaystoexaminescores

CIaboutPairwise Differencesmayallowforbetterdifferentiationofmodelperformance

CIaboutActualScoresmaybedifficulttodifferentiatemodelperformancedifferences

Model1

Model2

Diff:Model1- Model2

SS – CIs do not encompass 0

ConfidenceIntervals(CIs)

“Ifwere-runtheexperimentNtimes,andcreateN(1-α)100%CI’s,thenweexpectthetruevalueoftheparametertofallinside(1-α)100oftheintervals.”

Confidenceintervalscanbeparametric ornon-parametric…

TypesofConfidenceIntervals

Bootstrap

• Availableforalmostanystatistic.

• Morerobusttooutliers.• Sensitivetolackofcontinuity,smallsamples.

Parametric(normal)

• Sensitivetodeparturesfromassumeddistribution.

• Oftensensitivetooutliers.• Notavailableforsomestatistics.

NormalApproximationCI’s

Is a (1-α)100% Normal CI for ϴ, where • ϴ is the statistic of interest (e.g., the forecast mean)• se(ϴ) is the standard error for the statistic• zv is the v-th quantile of the standard normal distribution

where v= α/2.• A typical value of α is 0.05 so (1-α)100% is refered to as the 95th

percentile Normal CI

Estimate

StandardnormalvariatePopulation(“true”)

parameter


θ

se(θ)

zα/2

ApplicationofNormalApproximationCI’s

• Independenceassumption(i.e.,“iid”)– temporalandspatial• Shouldcheckthevalidityoftheindependenceassumption• METaccountsforfirstordertemporalcorrelation

• Normaldistributionassumption• Shouldcheckvalidityofthenormaldistribution(e.g.,qq-plots,othermethods)

• METdoesnotdothis– shouldbedoneoutsideofMET• However…METappliesappropriateapproachestoverificationstatistics

• Multipletesting• Whencomputingmanyconfidenceintervals,thetruesignificancelevelsareaffected(reduced)bythenumberofteststhataredone.


•NormalapproximationisappropriatefornumerousverificationmeasuresExamples:Meanerror,Correlation,ACC,BASER,POD,FAR,CSI

•AlternativeCIestimatesareavailableforothertypesofvariablesExamples:forecast/observationvariance,GSS,HSS,FBIAS,BrierScore

•Allapproachesexpectedthesamplevaluestobeindependentandidenticallydistributed.

IIDBootstrapAlgorithm

(Nonparametric)BootstrapCI’s

1. Resamplewithreplacement fromthesample(forecastandobservationpairs), x1,x2,...,xn

2. Calculatetheverificationstatistic(s)ofinterestfromtheresampleinstep1.

3. Repeatsteps1and2manytimes,sayBtimes,toobtainasampleoftheverificationstatistic(s)θB .

4. Estimate(1-α)100%CI’sfromthesampleinstep3.

EmpiricalDistribution(Histogram)ofstatisticcalculatedonrepeatedsamples

5%

5%

Boundsfor90%CI

ValuesofstatisticθB

BootstrapCIConsiderations

•Numberofpointsimpactsspeedofbootstrap• Grid-basedtypicallyusesmorepointsthanPoint-based• THUS:BootstrapisquickerwithPoint-based

•Numberofresamples impactsspeedofbootstrap• Recommendedvalueis1000• Ifyouneedtoreduce– trytodeterminewheresolutionsconvergetopickyourvalue

• BootstrapcanbedisabledinMET,ifconcernedaboutcomputespeed- checkstatusinconfig filebeforerunning

METViewer alternatives

• Twotypesofparametricintervalsavailablewhereappropriate.• Accumulatescores(e.g.overallaverage),findparametricinterval.

• Summarizescores(e.g.findaverageormedianvalueofalldailyPODvalues),findintervalappropriateforaverageormedian.

• Bootstrapthestatistics foreachfieldovertime.• Measures(between-field)uncertaintyoftheestimatesovertime,ratherthanthewithinfielduncertainty.

• Pairwisedifferencestatisticsandintervals(witheventequalization).

• Givesmorepowertodetectdifferencesbyeliminatingcasetocasevariability.

Conclusions

• Uncertaintyestimatesareanessentialpartofgoodverificationevaluations.

• Allestimatesarewrong,someestimatesareuseful.• METandMETViewer developersstrivetoprovidethemostcorrectandusefulintervalsforoutputstatistics.

Appendix C of MET Documentation: http://www.dtcenter.org/met/users/docs/overview.php

References and further reading• Gilleland, E., 2010: Confidence intervals for forecast verification. NCAR Technical

Note NCAR/TN-479+STR, 71pp. Availableat:https://opensky.ucar.edu/islandora/object/technotes%3A491

• Jolliffe and Stephenson (2011): Forecast verification: A practitioner’s guide, 2nd

Edition, Wiley & sons• JWGFVR (2009): Recommendation on verification of precipitation forecasts.

WMO/TD report, no.1485 WWRP 2009-1• Nurmi (2003): Recommendations on the verification of local weather forecasts.

ECMWF Technical Memorandum, no. 430• Wilks (2012): Statistical methods in the atmospheric sciences, ch. 7. Academic

PressSee also

http://www.cawcr.gov.au/projects/verification/

statistical significance, confidence, uncertainty · 2019-04-30 · practical vs. statistical...

Documents