statistical significance, confidence, uncertainty · 2019-04-30 · practical vs. statistical...
TRANSCRIPT
![Page 1: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/1.jpg)
Statisticalsignificance,confidence,uncertainty
TressaL.Fowler
![Page 2: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/2.jpg)
AccountingforUncertainty
•Observational•Model
• Modelparameters• Physics• Verificationscores
•Sampling• Verificationstatisticisarealizationofarandomprocess• Whatiftheexperimentwerere-rununderidenticalconditions?Wouldyougetthesameanswer?
![Page 3: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/3.jpg)
Uncertaintyestimatesareamongalonglistofimportantverificationpractices
• Welldefinedquestionsorgoals.• Large,representative,(identical?)sample.• Consistent,independentobservations.• Appropriatemethodsandstatistics.• Uncertaintyestimates.• Spatial,temporal,andconditionaldifferencesevaluated.• Userrelevantresults.• Thoroughlytestedsoftware.
Youcan’tfixbyanalysiswhatyoubungledbydesign.- Light,SingerandWillett.
![Page 4: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/4.jpg)
Definequestion(s)first.Thentheconfidenceintervalisaroundtherightstatistic.
•Whichmodelisbest?•Ismymodelupgradeanimprovement?
•Howfrequentlyareceilingsinthecorrectcategory?
![Page 5: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/5.jpg)
Practicalvs.statisticalsignificance
•Maynotbethesame.Why?•Failuretousesignificantfigures.•Verylargesamplesizes.•Statsassumesindependentsamples,butweatherrarelydelivers.
•Whichdoyouneed?Both!
![Page 6: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/6.jpg)
Twowaystoexaminescores
CIaboutPairwise Differencesmayallowforbetterdifferentiationofmodelperformance
CIaboutActualScoresmaybedifficulttodifferentiatemodelperformancedifferences
Model1
Model2
Diff:Model1- Model2
SS – CIs do not encompass 0
![Page 7: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/7.jpg)
ConfidenceIntervals(CIs)
“Ifwere-runtheexperimentNtimes,andcreateN(1-α)100%CI’s,thenweexpectthetruevalueoftheparametertofallinside(1-α)100oftheintervals.”
Confidenceintervalscanbeparametric ornon-parametric…
![Page 8: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/8.jpg)
TypesofConfidenceIntervals
Bootstrap
• Availableforalmostanystatistic.
• Morerobusttooutliers.• Sensitivetolackofcontinuity,smallsamples.
Parametric(normal)
• Sensitivetodeparturesfromassumeddistribution.
• Oftensensitivetooutliers.• Notavailableforsomestatistics.
![Page 9: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/9.jpg)
NormalApproximationCI’s
Is a (1-α)100% Normal CI for ϴ, where • ϴ is the statistic of interest (e.g., the forecast mean)• se(ϴ) is the standard error for the statistic• zv is the v-th quantile of the standard normal distribution
where v= α/2.• A typical value of α is 0.05 so (1-α)100% is refered to as the 95th
percentile Normal CI
Estimate
StandardnormalvariatePopulation(“true”)
parameter
![Page 10: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/10.jpg)
NormalApproximationCI’s
θ
se(θ)
zα/2
![Page 11: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/11.jpg)
ApplicationofNormalApproximationCI’s
• Independenceassumption(i.e.,“iid”)– temporalandspatial• Shouldcheckthevalidityoftheindependenceassumption• METaccountsforfirstordertemporalcorrelation
• Normaldistributionassumption• Shouldcheckvalidityofthenormaldistribution(e.g.,qq-plots,othermethods)
• METdoesnotdothis– shouldbedoneoutsideofMET• However…METappliesappropriateapproachestoverificationstatistics
• Multipletesting• Whencomputingmanyconfidenceintervals,thetruesignificancelevelsareaffected(reduced)bythenumberofteststhataredone.
![Page 12: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/12.jpg)
NormalApproximationCI’s
•NormalapproximationisappropriatefornumerousverificationmeasuresExamples:Meanerror,Correlation,ACC,BASER,POD,FAR,CSI
•AlternativeCIestimatesareavailableforothertypesofvariablesExamples:forecast/observationvariance,GSS,HSS,FBIAS,BrierScore
•Allapproachesexpectedthesamplevaluestobeindependentandidenticallydistributed.
![Page 13: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/13.jpg)
IIDBootstrapAlgorithm
(Nonparametric)BootstrapCI’s
1. Resamplewithreplacement fromthesample(forecastandobservationpairs), x1,x2,...,xn
2. Calculatetheverificationstatistic(s)ofinterestfromtheresampleinstep1.
3. Repeatsteps1and2manytimes,sayBtimes,toobtainasampleoftheverificationstatistic(s)θB .
4. Estimate(1-α)100%CI’sfromthesampleinstep3.
![Page 14: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/14.jpg)
EmpiricalDistribution(Histogram)ofstatisticcalculatedonrepeatedsamples
5%
5%
Boundsfor90%CI
ValuesofstatisticθB
![Page 15: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/15.jpg)
BootstrapCIConsiderations
•Numberofpointsimpactsspeedofbootstrap• Grid-basedtypicallyusesmorepointsthanPoint-based• THUS:BootstrapisquickerwithPoint-based
•Numberofresamples impactsspeedofbootstrap• Recommendedvalueis1000• Ifyouneedtoreduce– trytodeterminewheresolutionsconvergetopickyourvalue
• BootstrapcanbedisabledinMET,ifconcernedaboutcomputespeed- checkstatusinconfig filebeforerunning
![Page 16: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/16.jpg)
METViewer alternatives
• Twotypesofparametricintervalsavailablewhereappropriate.• Accumulatescores(e.g.overallaverage),findparametricinterval.
• Summarizescores(e.g.findaverageormedianvalueofalldailyPODvalues),findintervalappropriateforaverageormedian.
• Bootstrapthestatistics foreachfieldovertime.• Measures(between-field)uncertaintyoftheestimatesovertime,ratherthanthewithinfielduncertainty.
• Pairwisedifferencestatisticsandintervals(witheventequalization).
• Givesmorepowertodetectdifferencesbyeliminatingcasetocasevariability.
![Page 17: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/17.jpg)
Conclusions
• Uncertaintyestimatesareanessentialpartofgoodverificationevaluations.
• Allestimatesarewrong,someestimatesareuseful.• METandMETViewer developersstrivetoprovidethemostcorrectandusefulintervalsforoutputstatistics.
![Page 18: Statistical significance, confidence, uncertainty · 2019-04-30 · Practical vs. statistical significance •May not be the same. Why? •Failure to use significant figures. •Very](https://reader033.vdocuments.net/reader033/viewer/2022042117/5e95b72af40cc906934747d0/html5/thumbnails/18.jpg)
Appendix C of MET Documentation: http://www.dtcenter.org/met/users/docs/overview.php
References and further reading• Gilleland, E., 2010: Confidence intervals for forecast verification. NCAR Technical
Note NCAR/TN-479+STR, 71pp. Availableat:https://opensky.ucar.edu/islandora/object/technotes%3A491
• Jolliffe and Stephenson (2011): Forecast verification: A practitioner’s guide, 2nd
Edition, Wiley & sons• JWGFVR (2009): Recommendation on verification of precipitation forecasts.
WMO/TD report, no.1485 WWRP 2009-1• Nurmi (2003): Recommendations on the verification of local weather forecasts.
ECMWF Technical Memorandum, no. 430• Wilks (2012): Statistical methods in the atmospheric sciences, ch. 7. Academic
PressSee also
http://www.cawcr.gov.au/projects/verification/