scientific method_ statistical errors _ nature news & comment
DESCRIPTION
Comprehensive review of statistical errorsTRANSCRIPT
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 1/12
Print
NATURE | NEWSFEATURE
Scientificmethod:StatisticalerrorsPvalues,the'goldstandard'ofstatisticalvalidity,arenotasreliableasmanyscientistsassume.
12February2014
Forabriefmomentin2010,MattMotylwasonthebrinkofscientificglory:hehaddiscoveredthatextremistsquiteliterallyseetheworldinblackandwhite.
Theresultswereplainasday,recallsMotyl,apsychologyPhDstudentattheUniversityofVirginiainCharlottesville.Datafromastudyofnearly2,000peopleseemedtoshowthatpoliticalmoderatessawshadesofgreymoreaccuratelythandideitherleftwingorrightwingextremists.Thehypothesiswassexy,hesays,andthedataprovidedclearsupport.ThePvalue,acommonindexforthestrengthofevidence,was0.01usuallyinterpretedas'verysignificant'.PublicationinahighimpactjournalseemedwithinMotyl'sgrasp.
Butthenrealityintervened.Sensitivetocontroversiesoverreproducibility,Motylandhisadviser,BrianNosek,decidedtoreplicatethestudy.Withextradata,thePvaluecameoutas0.59notevenclosetotheconventionallevelofsignificance,0.05.Theeffecthaddisappeared,andwithit,Motyl'sdreamsofyouthfulfame1.
ItturnedoutthattheproblemwasnotinthedataorinMotyl'sanalyses.ItlayinthesurprisinglyslipperynatureofthePvalue,whichisneitherasreliablenorasobjectiveasmostscientistsassume.Pvaluesarenotdoingtheirjob,becausetheycan't,saysStephenZiliak,aneconomistatRooseveltUniversityinChicago,Illinois,andafrequentcriticofthewaystatisticsareused.
Formanyscientists,thisisespeciallyworryinginlightofthereproducibilityconcerns.In2005,epidemiologistJohnIoannidisofStanfordUniversityinCaliforniasuggestedthatmostpublishedfindingsarefalse2sincethen,astringofhighprofilereplicationproblemshasforcedscientiststorethinkhowtheyevaluateresults.
Atthesametime,statisticiansarelookingforbetterwaysofthinkingaboutdata,tohelpscientiststoavoidmissingimportantinformation
ReginaNuzzo
DALEEDWINMURRAY
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 2/12
oractingonfalsealarms.Changeyourstatisticalphilosophyandallofasuddendifferentthingsbecomeimportant,saysStevenGoodman,aphysicianandstatisticianatStanford.Then'laws'handeddownfromGodarenolongerhandeddownfromGod.They'reactuallyhandeddowntousbyourselves,throughthemethodologyweadopt.
OutofcontextPvalueshavealwayshadcritics.Intheiralmostninedecadesofexistence,theyhavebeenlikenedtomosquitoes(annoyingandimpossibletoswataway),theemperor'snewclothes(fraughtwithobviousproblemsthateveryoneignores)andthetoolofasterileintellectualrakewhoravishessciencebutleavesitwithnoprogeny3.Oneresearchersuggestedrechristeningthemethodologystatisticalhypothesisinferencetesting3,presumablyfortheacronymitwouldyield.
TheironyisthatwhenUKstatisticianRonaldFisherintroducedthePvalueinthe1920s,hedidnotmeanittobeadefinitivetest.Heintendeditsimplyasaninformalwaytojudgewhetherevidencewassignificantintheoldfashionedsense:worthyofasecondlook.Theideawastorunanexperiment,thenseeiftheresultswereconsistentwithwhatrandomchancemightproduce.Researcherswouldfirstsetupa'nullhypothesis'thattheywantedtodisprove,suchastherebeingnocorrelationornodifferencebetweentwogroups.Next,theywouldplaythedevil'sadvocateand,assumingthatthisnullhypothesiswasinfacttrue,calculatethechancesofgettingresultsatleastasextremeaswhatwasactuallyobserved.ThisprobabilitywasthePvalue.Thesmalleritwas,suggestedFisher,thegreaterthelikelihoodthatthestrawmannullhypothesiswasfalse.
ForallthePvalue'sapparentprecision,Fisherintendedittobejustonepartofafluid,nonnumericalprocessthatblendeddataandbackgroundknowledgetoleadtoscientificconclusions.Butitsoongotsweptintoamovementtomakeevidencebaseddecisionmakingasrigorousandobjectiveaspossible.Thismovementwasspearheadedinthelate1920sbyFisher'sbitterrivals,PolishmathematicianJerzyNeymanandUKstatisticianEgonPearson,whointroducedanalternativeframeworkfordataanalysisthatincludedstatisticalpower,falsepositives,falsenegativesandmanyotherconceptsnowfamiliarfromintroductorystatisticsclasses.TheypointedlyleftoutthePvalue.
ButwhiletherivalsfeudedNeymancalledsomeofFisher'sworkmathematicallyworsethanuselessFishercalledNeyman'sapproachchildishandhorrifying[for]intellectualfreedominthewestotherresearcherslostpatienceandbegantowritestatisticsmanualsforworkingscientists.Andbecausemanyoftheauthorswerenonstatisticianswithoutathoroughunderstandingofeitherapproach,theycreatedahybridsystemthatcrammedFisher'seasytocalculatePvalueintoNeymanandPearson'sreassuringlyrigorousrulebasedsystem.ThisiswhenaPvalueof0.05becameenshrinedas'statisticallysignificant',forexample.ThePvaluewasnevermeanttobeusedthewayit'susedtoday,saysGoodman.
Whatdoesitallmean?OneresultisanabundanceofconfusionaboutwhatthePvaluemeans4.ConsiderMotyl'sstudyaboutpoliticalextremists.MostscientistswouldlookathisoriginalPvalueof0.01andsaythattherewasjusta1%chanceofhisresultbeingafalsealarm.Buttheywouldbewrong.ThePvaluecannotsaythis:allitcandoissummarizethedataassumingaspecificnullhypothesis.Itcannotworkbackwardsandmakestatementsabouttheunderlyingreality.Thatrequiresanotherpieceofinformation:theoddsthatarealeffectwas
R.NUZZOSOURCE:T.SELLKEETAL.AM.STAT.55,6271(2001)
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 3/12
ThePvaluewasnevermeanttobeusedthewayit'susedtoday.
thereinthefirstplace.Toignorethiswouldbelikewakingupwithaheadacheandconcludingthatyouhaveararebraintumourpossible,butsounlikelythatitrequiresalotmoreevidencetosupersedeaneverydayexplanationsuchasanallergicreaction.Themoreimplausiblethehypothesistelepathy,aliens,homeopathythegreaterthechancethatanexcitingfindingisafalsealarm,nomatterwhatthePvalueis.
Thesearestickyconcepts,butsomestatisticianshavetriedtoprovidegeneralruleofthumbconversions(see'Probablecause').Accordingtoonewidelyusedcalculation5,aPvalueof0.01correspondstoafalsealarmprobabilityofatleast11%,dependingontheunderlyingprobabilitythatthereisatrueeffectaPvalueof0.05raisesthatchancetoatleast29%.SoMotyl'sfindinghadagreaterthanoneintenchanceofbeingafalsealarm.Likewise,theprobabilityofreplicatinghisoriginalresultwasnot99%,asmostwouldassume,butsomethingcloserto73%oronly50%,ifhewantedanother'verysignificant'result6,7.Inotherwords,hisinabilitytoreplicatetheresultwasaboutassurprisingasifhehadcalledheadsonacointossandithadcomeuptails.
CriticsalsobemoanthewaythatPvaluescanencouragemuddledthinking.Aprimeexampleistheirtendencytodeflectattentionfromtheactualsizeofaneffect.Lastyear,forexample,astudyofmorethan19,000peopleshowed8thatthosewhomeettheirspousesonlinearelesslikelytodivorce(p
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 4/12
Relatedstories
Numbercrunch
Policy:NIHplanstoenhancereproducibility
Weakstatisticalstandardsimplicatedinscientificirreproducibility
Morerelatedstories
Article
Article PubMed
Article
we'renowseeing.Wejustdon'tyethaveallthefixes.
Statisticianshavepointedtoanumberofmeasuresthatmighthelp.Toavoidthetrapofthinkingaboutresultsassignificantornotsignificant,forexample,Cummingthinksthatresearchersshouldalwaysreporteffectsizesandconfidenceintervals.TheseconveywhataPvaluedoesnot:themagnitudeandrelativeimportanceofaneffect.
ManystatisticiansalsoadvocatereplacingthePvaluewithmethodsthattakeadvantageofBayes'rule:aneighteenthcenturytheoremthatdescribeshowtothinkaboutprobabilityastheplausibilityofanoutcome,ratherthanasthepotentialfrequencyofthatoutcome.Thisentailsacertainsubjectivitysomethingthatthestatisticalpioneersweretryingtoavoid.ButtheBayesianframeworkmakesitcomparativelyeasyforobserverstoincorporatewhattheyknowabouttheworldintotheirconclusions,andtocalculatehowprobabilitieschangeasnewevidencearises.
Othersargueforamoreecumenicalapproach,encouragingresearcherstotrymultiplemethodsonthesamedataset.StephenSenn,astatisticianattheCentreforPublicHealthResearchinLuxembourgCity,likensthistousingafloorcleaningrobotthatcannotfinditsownwayoutofacorner:anydataanalysismethodwilleventuallyhitawall,andsomecommonsensewillbeneededtogettheprocessmovingagain.Ifthevariousmethodscomeupwithdifferentanswers,hesays,that'sasuggestiontobemorecreativeandtrytofindoutwhy,whichshouldleadtoabetterunderstandingoftheunderlyingreality.
Simonsohnarguesthatoneofthestrongestprotectionsforscientistsistoadmiteverything.Heencouragesauthorstobrandtheirpapers'Pcertified,notPhacked'byincludingthewords:Wereporthowwedeterminedoursamplesize,alldataexclusions(ifany),allmanipulationsandallmeasuresinthestudy.Thisdisclosurewill,hehopes,discouragePhacking,oratleastalertreaderstoanyshenanigansandallowthemtojudgeaccordingly.
Arelatedideathatisgarneringattentionistwostageanalysis,or'preregisteredreplication',sayspoliticalscientistandstatisticianAndrewGelmanofColumbiaUniversityinNewYorkCity.Inthisapproach,exploratoryandconfirmatoryanalysesareapproacheddifferentlyandclearlylabelled.Insteadofdoingfourseparatesmallstudiesandreportingtheresultsinonepaper,forinstance,researcherswouldfirstdotwosmallexploratorystudiesandgatherpotentiallyinterestingfindingswithoutworryingtoomuchaboutfalsealarms.Then,onthebasisoftheseresults,theauthorswoulddecideexactlyhowtheyplannedtoconfirmthefindings,andwouldpubliclypreregistertheirintentionsinadatabasesuchastheOpenScienceFramework(https://osf.io).Theywouldthenconductthereplicationstudiesandpublishtheresultsalongsidethoseoftheexploratorystudies.Thisapproachallowsforfreedomandflexibilityinanalyses,saysGelman,whileprovidingenoughrigourtoreducethenumberoffalsealarmsbeingpublished.
Morebroadly,researchersneedtorealizethelimitsofconventionalstatistics,Goodmansays.Theyshouldinsteadbringintotheiranalysiselementsofscientificjudgementabouttheplausibilityofahypothesisandstudylimitationsthatarenormallybanishedtothediscussionsection:resultsofidenticalorsimilarexperiments,proposedmechanisms,clinicalknowledgeandsoon.StatisticianRichardRoyallofJohnsHopkinsBloombergSchoolofPublicHealthinBaltimore,Maryland,saidthattherearethreequestionsascientistmightwanttoaskafterastudy:'Whatistheevidence?''WhatshouldIbelieve?'and'WhatshouldIdo?'Onemethodcannotanswerallthesequestions,Goodmansays:Thenumbersarewherethescientificdiscussionshouldstart,notend.
Nature 506, 150152 (13February2014) doi:10.1038/506150a
SeeEditorialpage131
References
1. Nosek,B.A.,Spies,J.R.&Motyl,M.Perspect.Psychol.Sci.7,615631(2012).Showcontext
2. Ioannidis,J.P.A.PLoSMed.2,e124(2005).Showcontext
3. Lambdin,C.TheoryPsychol.22,6790(2012).Showcontext
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 5/12
Article ISI ChemPort
Article PubMed ISI ChemPort
Article PubMed ISI ChemPort
Article PubMed ISI
Article PubMed ChemPort
Article PubMed
Article
4. Goodman,S.N.Ann.InternalMed.130,9951004(1999).Showcontext
5. Goodman,S.N.Epidemiology12,295297(2001).Showcontext
6. Goodman,S.N.Stat.Med.11,875879(1992).Showcontext
7. Gorroochurn,P.,Hodge,S.E.,Heiman,G.A.,Durner,M.&Greenberg,D.A.Genet.Med.9,325321(2007).Showcontext
8. Cacioppo,J.T.,Cacioppo,S.,Gonzagab,G.C.,Ogburn,E.L.&VanderWeele,T.J.Proc.NatlAcad.Sci.USA110,1013510140(2013).
Showcontext
9. Simmons,J.P.,Nelson,L.D.&Simonsohn,U.Psychol.Sci.22,13591366(2011).Showcontext
10. Simonsohn,U.,Nelson,L.D.&Simmons,J.P.J.Exp.Psychol.http://dx.doi.org/10.1037/a0033242(2013).Showcontext
11. Campbell,J.P.J.Appl.Psych.67,691700(1982).Showcontext
Relatedstoriesandlinks
Fromnature.comNumbercrunch12February2014Policy:NIHplanstoenhancereproducibility27January2014Weakstatisticalstandardsimplicatedinscientificirreproducibility11November2013Uncertaintyontrial02October2013Mattersofsignificance29August2013Announcement:Reducingourirreproducibility24April2013Replicationstudies:Badcopy16May2012Blogpost:Let'sgivestatisticstheattentionitdeservesinbiologicalresearchBlogpost:StatisticsisthesexyinscienceNaturespecial:Challengesinirreproducibleresearch
FromelsewherePsychologicalSciencetutorialonalternativestothePvalueTheBUGS(BayesianinferenceUsingGibbsSampling)ProjectBayesianCognitiveModelling:APracticalCourse
Authorinformation
AffiliationsReginaNuzzoisafreelancewriterandanassociateprofessorofstatisticsatGallaudetUniversityinWashingtonDC.
Forthebestcommentingexperience,pleaseloginorregisterasauserandagreetoourCommunityGuidelines.Youwillberedirected
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 6/12
37comments Subscribetocomments
backtothispagewhereyouwillseecommentsupdatinginrealtimeandhavetheabilitytorecommendcommentstootherusers.
Commentsforthisthreadarenowclosed.
Guest 2014030607:31PMCananyonehelpmeunderstandthe"probablecause"pictureofthepaper?IadmitthatIamlost.1.Whatisthemeaningof"oddsofhypothesis"?AhypothesiscanbeRight,orwrong.Whatisoddsofitmean?Ifweknowtheodds,dowestillneedtoknowpValue?2.HowcanIgetthenumberinthepicture:with1to19oddsofhypothesis+pValue=0.05>oddsbecome11%vs.89%.Thanks.
CharlesGreen 2014022104:21PMInstatisticsPvaluesalthoughcalledconfidencevaluesarenotmeasuresofaccuracy.Theyareastatementofwhatthedistributionofresultsthatcanbemadewhenthetestisreplicated.Ifoneusesthemcorrectlytheyareawaytoselectfutureprojects.AtestwithaveryhighPcouldbereplicatedwithasmallersamplethussavinginthecostofreplication.Ifthereplicationreultsfailsthenanothertestisneededwithalargersample.Onlyaftermanytestscanconclusionsbemade,Useofdifferenttestsdoesnotimprovethesituation.ItestedmagazinevsTVspendingwiththreetestsallwithP=98to99.4.Allwerewrongduetoanerrorintheunderlingdata.Iassumecontinuousdatawheniswasdiscretedata.Morethanthreequartersofmyeducationinstatisticsdealtwitherrorsindesignandconsiderationoftheunderlingdata.
DavidClarke 2014021805:40PMI'mprobablystupidbuthowamIsupposedtoassesstheprobabilityofmyhypothesisintoa"longshot"at19to1"atossup"at1to1a"goodbet"at9to1?
BarryCohen 2014021510:37PMRegina:InyourfigurelabeledProbableCause,youciteSellkeetal.(2001)asthesource.Ireadthatarticle,andcouldnotseehowyouderivedyourfigurefromthatarticle.Specifically,Sellkeetal.(2001)neededtopositavalueforxi,whichastheydefineditcorrespondstowhatpsychologistsusuallycalldelta(thebasisfordeterminingthepowerofattest),inordertofindtheoddsfordifferentpvalues(seetheirFigure2).Itisnotclearwhatvalueforxiyouareusing,thoughfromyourresults,Iwouldguessitisabout.75,whichwouldleadtoanaveragepowerforrealeffectsofabout.12fora,05,twotailedsignificancetest.Moregenerally,yourfigurereliesonthechanceofa"realeffect"ineachcase,butareyoudefiningarealeffectasanythingotherthanexactlyzero?Doesn'titmatterhowlarge,onaverage,theserealeffectsare?Inotherwords:Weknowtheprobabilityofobtainingapvalueof.05orsmallerwhenthereisnorealeffect,butdoesn'ttheprobabilityofobtainingapvalueof.05orsmallerwhenthereISarealeffectdependonthesizeoftherealeffect(foragivensamplesize)?Whatassumptionareyoumakingthere?
deborahmayo 2014021503:53AMRegina:ThereisacitationfromNeymaninthisarticlebutIdontseethereference.Idbegratefulifyouprovidedit.I'mfairlysureit'sentirelytakenoutofcontext."worsethanuseless"isatechnicalterm.PoorMotylwasonthebrinkofscientificglorybymeansofshoddystatistics!Glory,Itellyou,glory.Maybeheshouldbegivenamedalfornotrushingintoprintasimaginedbythosewhoviewscienceasanunthinkingscreeningeffort.Ofcourse,itcantbethathesfallenintothedumbestofdumbmisusesofpvalues.Itcannotbethathesexploitingfraudulentusesofstatistics.No,thisauthorblamesthestatisticaltoolsforhishighlyquestionableexploitationofpvalues.Thetruthisthattheonlyshadesofgreyhereisthefactthatmisuseofstatisticsdiffersonlyindegreefromoutandoutfraud.Anyinferenceisquestionableiftheresearchercannotshowthatflawsinhisorheranalysiswouldhavebeendetectedwithhighprobability.Fields(likethisone)thatregularlyspinoutresultswithoutshowingtheyhaveworkedhardorhaveeventriedtosubjecttheirownanalysistoseverescrutinyarepseudoscientific.Pseudoscientistsarefraudsandshouldbetreatedassuch.Sciencewriterswhoexploitthefashionofdumpingonpvaluesonlygivethemexcuses.
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 7/12
DavidLovell 2014021502:52AMThanksforraisingawarenessonthisRegina.IwouldbeveryinterestedtoseeafollowuparticleorcommentsaboutFalseDiscoveryRate(FDR)proceduresusedinsituationswheremultiplecomparisonsaremade.I'mnotastatisticianmyintuitionisthatFDRfurthermaskstheshortcomingsofNullHypothesisSignificanceTesting.AsfarasIunderstandit,FDRamountstosettingamorestringentpvalueatwhichoneregardsdatatobestatisticallysignificant.FDRproceduresaboundinbioinformaticsandotherareasofmodernquantitativebiosciencewheremeasurementsareplentiful.IsmyskepticismofFDRwarranted?
BenWise 2014021403:11PMMorrisDeGrootatCMUmadethispointwaybackinthe1980's.PValuesand"significance"measuretheprobabilityofdatagiventhehypothesis,nottheprobabilityofthehypothesisgiventhedata.Thisisexactlybackwards,as(toquoteDeGroot)"Ialreadyknowtheprobabilityofthedata:1,becauseIjustobservedit!".ItiseasytofindactualcaseswhereP(D|H)=0.99andP(D|H)=0.99,butP(H|D)=0.01.InEnglish,thehypothesishasahighPValueandisextremely"significant"butisalmostcertainlywrong.Nowondersomanymedicalstudiesareoverturnedbylaterstudies:theywerehighlysignificant,butnotveryprobable.AcommentwasmadebelowthatBayesianmethodsmustbeusedcarefully.IjustrepeatDeGroot'sresponse(whichIheardwhenhewasconfrontedwiththesamecriticism):itisbettertodotherightcalculationcarefullythantodothewrongoneeasily.
deborahmayo 2014021503:42AMpvaluesareNOTlikelihoods,however,theypermitcomputationsthatBayesianlikelihoodsalonecannot.Theyallowevaluatingtheprobabilitythatthetestingprocedurewouldhaveresultedinalessimpressivedeparture(fromthenull)undertheassumptionthenullistrue,andalsoundertheassumptionofvaryingdiscrepanciesfromthenull.It'sasmallpartofthepanoplyofmethodsthatuseerrorprobabilities.Guesswhat?Bayesiansaretheoneswhoonlyuselikelihoodsconditionalontheobservedvalue!Sonoerrorprobabilisticassessmentsarepossible.Oh,butthere'saprioryousay?Noerrorcontrolthereeitherjustwhatsomeonebelieves,andveryfewscientistswanttomixtheirpriorbeliefsintothestudy.Thepointoftheresearchistotestclaimsnotbegthequestionbyimputingpriorbeliefs!
HuwLlewelyn 2014021412:59PMThecommonsensequestionfacedbythosewhointerpretdatainapublicway(e.g.doctors,engineersandresearchscientists)isWhatdoIpredictfromthisobservation?Theanswercanbethat(1)theobservationwillprobablynotbereplicatedandisprobablyspurious(2)thatitsuggestsasimplepredictionorapredictionlinkedtopossiblenarrativeormathematicalmodels(e.g.adiagnosis,aworkingengineeringmodel,ageneralscientifichypothesis,theoryorlaw)thatwillinturnmakemanyotherusefulpredictions.Thefirsthurdletoovercomeiswhetherornottheobservationwillprobablybereplicated.Thiscandonebyshowingthatallthepossiblereasonsfornonreplicationareimprobable.Oneofthesecausesofnonreplicationisthatthenumberofobservationsistoolow(iftheobservationismadeupofanumberofdifferentobservations).Thisiswherestatisticalsignificancetestingcomesin(successfullyrepeatingtheentiresetofobservationsindependentlymakestheprobabilityoffurthernonreplicationduetothisreasonverylowofcourse).Therearemanyotherreasonsfornonreplicationtobeconsideredeg.poordocumentationorvaguewriting,dishonesty,poormethodology,contradictoryobservationsbysomeoneelse,etc,etc.Inorderfortheprobabilityofreplicationtobehigh,theprobabilitiesofallthesecausesofnonreplicationalsohavetobelow.Thereasoningprocessinevitablyhastobesubjectivebutthereisaformalbasisforitinprobabilitytheory(thatincorporatesBayesrule)toguideus(seealsoLlewelynH.Reasoninginmedicineandscience.OUPblog,September2013).
MarkBrewer 2014021304:51PMI'mgladthisarticlehasprovokeddiscussion.WhatIfindsurprisingisthefactthatthe"ProbableCause"infographicpresentsabeautifulargumentforaBayesianapproach,withoutactuallysayingso,orevenrealisingitisdoingso.
HT 2014021308:30PMIdon'tthinkthatreplacingfrequentistwiththeBayesianapproachistheanswer,noristhatthemessageofthearticle.
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 8/12
Bayesianstatisticscandemonstratetheshortcomingsoffrequentiststatisticsverywellinsomesituations(likeintheinfographic),butalsorequiresgreatcaretohandle.Itwouldbenaivetothinkthatresearcherswhoabusepvalueswouldnotdothesametopriorsandmodelspecification.
deborahmayo 2014021503:48AMThey'dnecessarilydoworseandfraudbustingwouldbedead.Why?Allcriticismsturnonbeingabletoevaluateerrorprobabilities(evenifonlyinformally),e.g.,showingthestudylikeMotyl'shasdonepracticallynothingtopreventtheworstkindofabuseandfraudulentuseofstatistics.Iagreeit'sanicepicture,butthearticleismisleadingindozensofways.Simonsohnisinterviewedbuttheauthordoesn'tbothertomentionthathepointsouthowBayesianstatisticsonlyintroducesmoreflexibilityintotheanalysis.It'squiteabiasedarticle,whichreallydefeatsthepurpose.
PaulHayes 2014021507:11AMSimonsohnwaswrongonthatpoint:http://doingbayesiandataanalysis.blogspot.co.uk/2011/10/falseconclusionsinfalsepositive.html
BobOHara 2014021410:10AMIndeed.Infact,wecouldjustreplacetheabuseofpvalueswiththeabuseofBayesfactorsandBayesianpvalues.
JohnVidale 2014021304:50PMVerygoodarticle,butitmissesthemarkintwoways,IMO.Thisisreallyaprimerforthepublictotakescienceheadlineswithagrainofsalt.First,theunderlyingreasonformisuseofstatisticsisthenaturaloptimismofscientistswethinkwewillfindwhatnoonehaspreviouslyfound,andthatourexperimentwastheonesensiblewaytoexploretheproblem.Thatis,weoverestimatetheapriorilikelihoodthatoursolutionwasright,andweunderestimatetheamountoffiddlingwe(andothers)havedoneleadinguptoourlatestresult.Second,scientistsvarygreatlyintheirfamiliaritywithstatisticsandbasiccommonsensetheyalwayshaveandalwayswill.Requiringtediouspublicationofalldata,studiesthroughsequentialpublications,applicationofmultiplestatisticaltestsineachstudymayamelioratesomeproblems,butwillimpedemanyothers.Ashasbeentrueforever,scientistslookingatdataneedtounderstandstatisticaltoolstousethemright,asasserted.Ialsodoubtitisanewphenomenonthatscientistsrecognizethefallibilityofthelatest,hotteststudy.Irecallseveraloftheireditorstellingmethatmany(most)ScienceandNaturepapersareincorrect.
AllenBryant 2014021304:04PMHavingtaughtStatisticsforanumberofyears,theissueofwhatthevalueofthePValueis,isn'treallythatimportant.Whatisimportantisaproperlystructuredhypothesistest.ThePvalueisameasureofwhatdegreeofconfidencewewishtoknowsomethingmightbetrueshouldthehypothesistestprovethatwecanrejectournullhypothesisinfavorofthealternativehypothesis.Ifresultsarenotreproducible,perhapsyourhypothesiscan'tberejectedandyouneedtocompletelyreconsideryourhypothesis.
BenWise 2014021403:23PMMorrisDeGroothadacommentonthislineofreasoningbackinthe1980's.Theonlywaytojudgewhenitis"properlystructured"isbycomparingittoBayesianreasoning,thatis,tomakesureahighP(D|H)occursonlywhenP(H|D)ishigh.ButifyouhavetodotheexactBayesiananalysisinordertomakesurethepvalueheuristic(ashecalledit)isdoingtherightthing,whynotjustkeeptheBayesiananalysisandskiptheheuristic?ThisistheapproachItaughtmygraduatestudentsI'drecommendDeGroot'sworkasaverybalancedapproachthatcombinesbothpracticalcommonsense(youneedpvaluestogetpublished,andtheyareeasiertocalculate)andtheoreticalrigor(whencantheybereliedupon).
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 9/12
ThomasDent 2014021301:43PMTheauthorshouldtakeherownadviceandshowsomevalidstatisticstobackupsweeping,offthecuffclaimsaboutwhat'mostscientists'or'mostresearchers'mightormightnotdo."MostscientistswouldlookathisoriginalPvalueof0.01andsaythattherewasjusta1%chanceofhisresultbeingafalsealarm."Define'mostscientists'.Whatevidenceisthereforthisclaim?Whatisyoursamplesize:howmanyscientistshavebeenobjectivelytestedonwhattheywouldsayinthesecircumstances?Whatistheeffectsize:howmanydidinfactsaythethingyouclaimed?Isitafairsampleofallscientists,oraresomedisciplinesorsomelevelsofseniorityorsomenationalitiesoverorunderrepresented?Howcanwebesuretheauthordoesn'tcherrypickconversationswheresomeoneappearsnottounderstandpvalues?Thisisnotajokeit'saveryseriouspoint.Therearescientistswhodounderstandpvaluesandputconsiderableeffortintousingthemcorrectlytheparticlephysicscommunityisoneexample.Articleslikethisonewhichblame'thepvalue'foreverythingpeopledowrongwithstatistics,asifthemethoditselfratherthantheusestowhichitisputwassomehowtherootofallevil,amounttounfairlysmearingresultsobtainedbyacorrectandrigoroususageofpvalues,i.e.withblinddataanalysisandhonestlyaccountedfortrialsfactors(the'lookelsewhereeffect').Tosaythatmuddledthinkingandselfdeceptionare*caused*byuseofpvaluesisabsurdpeoplewhoarepronetomuddledthinkingandselfdeceptionwillcarryonbeingsoregardlessofthestatisticalframework.Youmightaswellclaimthatabuseofsignificantfiguresiscausedbytheuseofthedecimalpoint.
BenWise 2014021403:16PMIthinkthedrivingfactorisnot"mostscientists"butthereviewersofmajorjournals.Itisessentiallyimpossibletogetapaperpublishedwithoutahighpvalue,whichdriveseveryoneelsetodesigntheirworktogeneratehighpvalues,whethertheyagreewiththemornot."Publishorperish".
AbhaySharma 2014021311:01AMOverselectionandoverreportingoffalsepositiveresultsareincreasinglyplaguingthepublishedresearchwithanalarmingrate(Nature485,1492012).Inthecurrentpractice,suchreportingisconsideredashonesterrorsnotamountingtomisconduct(Nature485,1372012).However,sinceintentionisthecoreofmisconduct,onemayverywellarguethatreportingofresultswithsystematicpositivebiasshouldalsobeplacedundertheambitofmisconduct.Scientificcommunityandpolicymakersneedtoconsiderthistoughoptionintheoverallinterestofscience.[Thisisapartofthecommentsmadeearlier(http://www.nature.com/news/policynihplanstoenhancereproducibility1.14586)].
MarkAlexander 2014021310:34AMInspiteofthearticle'scommentthatpvaluesfromresearchintophenomenaliketelepathyarelikelytobe"falsealarms",inpointoffact,someofthemostsignificantpvaluesinanyareaofresearchcomefrompreciselythisdirection.Ithinkoftheganzfeldstudies,whichhaveproducedmindbogglingprobabilityvaluesontheorderof10^18.Goodman'sformuladoesn'tdosuchavalueanydamage.Thecommentinthearticleespeciallyinsofarasitgroupssuchresearchwith"homeopathyandaliens"asalabelofderisionreflectsawidespreadbutregrettablelackofknowledgeaboutwhathasbeenachievedinthisarea.
MarkBrewer 2014021305:09PMI'mafraidthatpvaluesarealwaysgoingtobeflawed(quotingnumbersoftheorderof10^18justsmacksofdesperation)whenthebasicunderlying"science"isflawed.
MarkAlexander 2014021407:08AMNevertheless,thosepvaluesareobjectivelypresent.Ontheonehand,findingsinthisareaareroutinelydismissedbecause'extraordinaryclaimsrequireextraordinaryevidence'.Butthen,whenpvaluessuchasthesearepresented,itonly'smacksofdesperation'.Whatkindof"science"isthat?
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 10/12
PaulHayes 2014021502:12AMIt'sgoodscience.Interpretingaprobabilityofonly10^18thatyourtelepathyexperimentresultswerecausedby'chance'asevidencethattheywerecausedbytelepathy,givenwhatisalreadyknownfromrelevantpreviousresults,isbadscience.Verybadscience.http://blogs.discovermagazine.com/cosmicvariance/2008/02/18/telekinesisandquantumfieldtheory/#.Uv7L8XWbBiBhttp://wwwbiba.inrialpes.fr/Jaynes/cc05e.pdf
MarkAlexander 2014021809:43AMActuallyacquaintyourselfwiththeliteraturebeforeyoupassjudgement.Oratleastagoodchunkofit.Inspiteofmakingreferencetothe'qualityofpreviousresults',it'sclearthatyouhaven'tseenthemandthat'spreciselythepointI'mmaking.Yourfirstlinkrulesoutthesephenomenaasamatterofprinciple.Yoursecondlinkdoesn'timpactontheganzfeldresultsI'mcitinginanywaywhatsoever.It'ssimplyasecondhanddiscussionofwhyallsuchresultsmustsurelybemistaken.Inotherwords,allyou'vedoneiscitetwoclaimswhysuchresultsshouldbedismissedoutofhand.Andno,that'snotgoodscience.
ChandrikaBRao 2014021309:15AMMisinterpretationofpvalueseldomcomesfrombiostatisticians.Forthebiologist,afterhavingdonemanymonthsofworkgeneratingthedata,thefinalstatisticalanalysisseemstobeaminormatter,notdeservingmuchattention.Manybiologistsprefertodotheirownstatisticalanalysisratherthaninvolvinganotherpersonforthis'minor'work.Thecautiousorconservativeinterpretationofdataprovidedbythestatisticiansometimesdoesn'tgodownwellwithbiologistswantingdefinitiveconclusionsfromtheirhardwork.Manynonstatisticaljournalspublishpvalueswithoutanyassociatedinformationlike:whatwasthesamplesize,whatstatisticaltestwasdone,whathypothesiswasbeingtested,whatwasthepowerofthetestetc.,resultinginsloppystatisticalanalysisandsloppierreporting,makingtruethesaying,"therearelies,damnliesandstatistics".Nonstatisticianrefereesseldomaskadequatequestionsaboutthestatisticalmethodsusedandanalysisdone.Whenanarticlewithsubstandardstatisticalworkgetsacceptedforpublicationinagoodbiologyjournal,thebiologistnolongerfeelstheneedtotalkwithastatistician.Journalswithwordlimitsalsounconsciouslyencouragecuttingcornersonstatisticalanalysisreporting.Iamverysurprisedbytheheatof"Wemustcollaboratewithstatisticians,notletthemdecidewhatisgoodforpatients."comingfromGiovanniCodacciPisanelli.Arestatisticiansirrational,unforgivingogres,notcaringforthegoodofthepatients??
PeterGerardBeninger 2014021309:13AMI'mhappytoseethatthisissueisbeginningtoemergeontheradarofmorescientists,notablythereadersofNatureandScience.However,theoneswhopersistintheworst,andmostcommon,misuseoffrequentiststatisticsrarely,ifever,readthesejournals,andseemequallyoblivioustothevastnumberofpublications,inallfields,whichmakethesamepoints.Theirpapersconstitutethemajorityofmanymainstreamspecialtyjournals.Ihavetakenthisupdirectlywithsenioreditors,whosimplyreplythattheydotheirjobsbyrelyingonreviewersforqualitycontrol.Mysuggestionthateachreputablejournalshouldhaveafulltimestatisticianonboardtoreviewtheproceduresusedinall'provisionallyaccepted'papers,aswellasforallstatisticallycontestedpapers(asisthepraciceinthebestmedicaljournals),hassofarfallenondeafearsforallofthejournalsinmyownresearchfield.Thesituationwillonlyimproveifwepushthepublishershardenough.
JanePublic 2014021308:06AMBrian:IthinkIcanansweratleastpartofthisforyou.Ihadadiscussionaboutthiswithsomeonejusttheotherday.AlthoughIdon'tthinkhegotthepoint.Anyway,let'suseapurelyhypotheticalexampletoillustratethepoint.Let'ssaysomeonedecidestostudytheIQsofthestudentsatuniversities,andtestcorrelationsbetweenIQandvariousotherfactors.IQtestsareadministeredto10,000students,andtheresultsmoreorlessfollowtheexpectednormaldistribution,withameasurementerrorof+/2points.Sonowtheystartcomparingwithothermeasuredfactors.Andtheyfindsomethingverysurprising:thereisaverystronginversecorrelation(P=0.001...they'reVERYsureofthis),betweennipplesizeandIQ!(Hey...I'veseenmuchsillierthingsinstudiesbefore.)So...theygoontheeveningnewswiththeirstartlingdiscovery.Butwhatdoesthismean?Well,ifyouweretolookattheeffectsize,itturnsoutthatpeoplewithaureolaethatmeasured1.5cmacrosshadanIQthatwas0.02pointshigherthanstudentswhoseaureolaewere6cmacross.Sotheeffect0.02IQpointsisvery,verysmall.Eventhoughthereisstrongstatistical
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 11/12
evidenceofacorrelation,theactualeffectissosmallasnottoreallymatter.EvenworsetheeffectissmallerthanthemeasurementerrorfortheirIQtests.(Prettymuchinvalidatingtheirwork,ifanybodybotheredtocheck.)Sothis"statisticalsignificance",whileverystrong,hasaboutzero"significance"intherealworld.Althoughthisisasomewhatexaggeratedexample,thiskindofthingisnotthatunusual.AsIsay,Iwastryingtoexplainthistosomeonetheotherday,aboutexactlythiskindofannouncement:areportedstrongcorrelation,buttheeffectsizewastiny,andessentiallyburiedinthesmallprint.
Briancrawford 2014021210:21PMIlikedthearticlebuthaveaquickquestion.Whentheauthorsays"TopounceontinyPvaluesandignorethelargerquestionistofallpreytotheseductivecertaintyofsignificance,saysGeoffCumming,anemerituspsychologistatLaTrobeUniversityinMelbourne,Australia.Butsignificanceisnoindicatorofpracticalrelevance,hesays:Weshouldbeasking,'Howmuchofaneffectisthere?',not'Isthereaneffect?'"Howdoyoudecidewhatlevelofeffectisappropriatetoreport?Isitjustsubjectivedecision?Forexample,wouldaenrichmentofparticularsetofgenesof54%inonesamplecomparedwith46%inanotherbeenoughofaneffect?Eveniftheyareverysignificant?
BobOHara 2014021309:08AMYou'reaskingtherightquestion,butI(asastatistician)can'tansweritforyou:it'sbiologicaljudgement.Andthisisagoodthing.Afterall,youaredoingscience,notstatistics,soyourjudgementofwhatis'significant'shouldbebasedonscience.Ithinkifweallusedeffectsizesandconfidenceintervals,ahiddenbenefitwouldbethatitwouldmakeusthinkmoreabouttheactualscientificrelevanceofourresults.
BenWise 2014021403:47PMThereare(atleast)twodifferentsensesoftheword"significant"beingmixedtogether.Oneis"havingahighP(Data|Hypothesis)"andtheotheris"reasonabletoactupon".TheIQcorrelationexampleaboveisonethathashighP(D|H)buthasnopracticalimplicationsforanythinganyonewoulddecidetodoornot.Forexample,itdoesnothelponedecidewhethertoreleaseadrugontothemarketornot.Thesecondsenseof"significance"leadsonedirectlyintotherealmofdecisiontheoryandtheactualcostof(e.g.)TypeIandTypeIIerrors.Butdecisionsinvariablyinvolvetheweighingofcostsandbenefits,whichfalldifferentlyondifferentgroups,andsoinvolvealotofdebatesthatclassicalstatisticianstrytoavoid.Again,anicecompromiseisthereportactualprobabilityvalues,likeP(H|D),.Theymakeanice"decoupledinterface",inthattheycanbetakeneitherasasummarystatementofhowstronglythehypothesisisindicated,orasthestartofadecisiontheoreticanalysis.
LuarMorenoAlvarez 2014021209:47PMAlthoughthisisaverygoodarticle,itislimitedinapproachandreferencestosocialandbiologicalsubjects.Perhaps,inordertoachieveamoregeneralandtechnicalviewofthisimportantissue,adeeperreviewofworksfromStatisticsjournalswouldbedesirable.Thepaper'PValuePrecisionandReproducibility'ofBoos&Stefanskiin'TheAmericanStatistician'(2011),forexample,couldbeusefultotheenrichmentofthisdiscussion.
GiovanniCodacciPisanelli 2014021208:16PMThisarticleisalongawaitedreminderofwhatstatisticscando,andofwhattheycannotdo!ThePeanutsstripsaboutstatisticiansarefunnier...butnotasextensive.Inclinicaloncologypvaluesoftenaretheaimoftheclinicaltrial.Still,justusingacomputerspreadsheetprogrammeitiseasytoprovethatifyouenterenoughvaluesyoupwillbecomesignificantevenwhenthedifferencebetweentwomeansisminimal.Unfortunately"statisticallysignificant"isconsideredasynonymof"true",butveryoftenitratherseemstomean"clinicallyirrelevant"(atrialwitha0.7weeksdifferenceinprogressionfreesurvivalofpatientswithadvancedcancerhadasignificantp).Butwhatisevenmoredisturbingisthenumberofretractedpapers(forexampleongenesignatures)basedonthe"statisticalevaluation"ofresults...thatcouldnotbereproduced.Wemustcollaboratewithstatisticians,notletthemdecidewhatisgoodforpatients.
-
4/14/2015 Scientificmethod:Statisticalerrors:NatureNews&Comment
http://www.nature.com/news/scientificmethodstatisticalerrors1.14700?WT.ec_id=NATURE20140213 12/12
Nature ISSN00280836 EISSN14764687
2015NaturePublishingGroup,adivisionofMacmillanPublishersLimited.AllRightsReserved.partnerofAGORA,HINARI,OARE,INASP,CrossRefandCOUNTER
deborahmayo 2014021904:10PMAlongawaitedrepeatofacookbookarticlethatfollowstherecipeofsomany"frontpagenews"statexposesineverypurportedsciencemagforyears.andjustasshallow...PoorMotyl,it'ssohardtodogoodscience...
BobOHara 2014021309:05AMDon'tblamethestatisticians!We'vebeenbangingonaboutthisforyears,butpvaluesarejustsoentrenchedinthewayalotofscientiststhinkscienceshouldbedone.Theproblemisoneofinertia:pvaluesareacceptedasstandard,soscientiststeachtheirstudentsthatthisishowthingsshouldbedone,sothat'salltheylearn.
SteveSchwartz 2014021207:36PMAcriticalaspectofpvalues,andhypothesistestingunderafrequentistframeworkmoregenerally,thatisnotaddressedbythiscolumn,isthatthesetechniquesweredevelopedandoriginallyimplementedinthesettingwhererandomerroristheonly(oratleastdominant)reasonwhyaparticularstudymightnotyieldthecorrectanswer.Inthesettingofnonexperimentalresearch,wherethe"exposure"orstudyconditionhasnotbeenrandomlyassigned,whetherthestudyyieldsthetruerelationshiphasfarmoretodowithbiasesinmeasurementofkeyvariables,intheselectionofstudysubjects,andinaccountingforconfoundingorsimilarrelationshipsamongvariables.Neitherpvaluesnorconfidenceintervalsmeasurethesefeaturesofastudy.Butbecausepvaluesandconfidenceintervalsareeasytoproduce,andmeasuresofmanynonrandombiasesarenoteasytomake,thestatisticalindiceshavebecomethecoinofthescientificrealminresearchdesignssuchasobservationalstudieswheretheywerenotoriginallyintendedtobeused.
HT 2014021206:16PMThisarticleservesasawelcomereminderofthemanyfallaciesthatstillbelymodernscientificresearch.However,Ifeelitcoulduseabitmorebalanceinthecontextofreproducibility.IntheparagraphwhereMotyl'spvalueof0.01isrevisited,theprobabilityofreplicatingthisresultata'significant'(p