1. common problems with regression. a. inferring causation ...frederic/13/f16/day15.pdfthat there is...

Post on 22-Apr-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

1.Commonproblemswithregression.a.Inferringcausation.b.Extrapolation.c.Curvature.

2.Testingsignificanceofcorrelationorslope.

NoclassThuNov24,Thanksgiving.Readch9.Hw4is10.1.8,10.3.14,10.3.21,10.4.11andisdueTueNov29.

ThefinalFri Dec9,8am-11,right here,willbeonch1-10.BringaPENCILandCALCULATORandanybooksornotesyouwant.Nocomputers.http://www.stat.ucla.edu/~frederic/13/F16.

1

Commonproblemswithregression.

• a.Correlationisnotcausation.Especiallywithobservationaldata.

Commonproblemswithregression.

Commonproblemswithregression.Holmes and Willett (2004) reviewed all prospective studies on fat consumption and breast cancer with at least 200 cases of breast cancer. "Not one study reported a significant positive association with total fat intake.... Overall, no association was observed between intake of total, saturated, monounsaturated, or polyunsaturated fat and risk for breast cancer."

They also state "The dietary fat hypothesis is largely based on the observation that national per capita fat consumption is highly correlated with breast cancer mortality rates. However, per capita fat consumption is highly correlated with economic development. Also, low parity and late age at first birth, greater body fat, and lower levels of physical activity are more prevalent in Western countries, and would be expected to confound the association with dietary fat."

Commonproblemswithregression.• b.Extrapolation.

Commonproblemswithregression.• b.Extrapolation.• Oftenresearchersextrapolatefromhighdosestolow.

Commonproblemswithregression.• b.Extrapolation.Therelationshipcanbenonlinearthough.Researchersalsooftenextrapolatefromanimalstohumans.

Zaichkina etal.(2004)onhamsters

Commonproblemswithregression.• c.Curvature.Thebestfittinglinemightfitpoorly.Portetal.(2005).

Commonproblemswithregression.• c.Curvature.Thebestfittinglinemightfitpoorly.Wongetal.(2011).

Howwelldoesthelinefit?• 𝑟" isameasureoffit.Itindicatestheamountofscatteraroundthebestfittingline.• Residualplotscanindicatecurvature,outliers,orheteroskedasticity.

• 1 − 𝑟"� 𝑠(isusefulasameasureofhowfaroffpredictionswouldhavebeenonaverage.

Notethatregressionresidualshavemeanzero,whetherthelinefitswellorpoorly.

Commonproblemswithregression.• d.Statisticalsignificance.Couldtheobservedcorrelationjustbeduetochancealone?

InferencefortheRegressionSlope:Theory-BasedApproachSection10.5

Dostudentswhospendmoretimeinnon-academicactivitiestendtohavelowerGPAs?Example10.4

Dostudentswhospendmoretimeinnon-academicactivitiestendtohavelowerGPAs?

• Thesubjectswere34undergraduatestudentsfromtheUniversityofMinnesota.• Theywereaskedquestionsabouthowmuchtimetheyspentinactivitieslikework,watchingTV,exercising,non-academiccomputeruse,etc.aswellaswhattheircurrentGPAwas.• WearegoingtotesttoseeifthereisanegativeassociationbetweenthenumberofhoursperweekspentonnonacademicactivitiesandGPA.

Hypotheses• NullHypothesis:ThereisnoassociationbetweenthenumberofhoursstudentsspendonnonacademicactivitiesandstudentGPAinthepopulation.

• AlternativeHypothesis:ThereisanegativeassociationbetweenthenumberofhoursstudentsspendonnonacademicactivitiesandstudentGPAinthepopulation.

DescriptiveStatistics• GPAA = 3.60 − 0.0059(nonacademichours).• Whatdotheslopeandy-interceptmean?

ShuffletoDevelopNullDistribution

• Wearegoingtoshufflejustaswedidwithcorrelationtodevelopanulldistribution.• Theonlydifferenceisthatwewillbecalculatingtheslopeeachtimeandusingthatasourstatistic.• atestofassociationbasedonslopeisequivalenttoatestofassociationbasedonacorrelationcoefficient.

Betavs Rho• Testingtheslopeoftheregressionlineisequivalenttotestingthecorrelation (samep-value,butobviouslydifferentconfidenceintervalssincethestatisticsaredifferent)• Hencethesehypothesesareequivalent.• Ho:β =0Ha:β <0(Slope)• Ho:ρ =0Ha:ρ < 0(Correlation)

• Sampleslope(b)Population(β:beta)• Samplecorrelation(r)Population(ρ:rho)

• Whenwedothetheorybasedtest,wewillbeusingthet-statisticwhichcanbecalculatedfromeithertheslopeorcorrelation.

Introduction• Ournulldistributionsareagainbell-shapedandcenteredat0(foreithercorrelationorslopeasourstatistic).

The book on p549 finds a p value of 3.3% by simulation.

ValidityConditions• Undercertainconditions,theory-basedinferenceforcorrelationorslopeoftheregressionlineuset-distributions.• Wecouldusesimulationsorthetheory-basedmethodsfortheslopeoftheregressionline.• Wewouldgetthesamep-valueifweusedcorrelationasourstatistic.

PredictingHeartRatefromBodyTemperatureExample10.5A

HeartRateandBodyTemp• Earlierwelookedattherelationshipbetweenheartrateandbodytemperaturewith130healthyadults• PredictedHeartRate = −166.3 + 2.44 Temp• r=0.257

HeartRateandBodyTemp

• Wetestedtoseeifwehadconvincingevidencethatthereisapositiveassociationbetweenheartrateandbodytemperatureinthepopulationusingasimulation-basedapproach.(Wewillmakeit2-sidedthistime.)• NullHypothesis:Thereisnoassociationbetweenheartrateandbodytemperatureinthepopulation. β=0• AlternativeHypothesis:Thereisanassociationbetweenheartrateandbodytemperatureinthepopulation. β≠0

HeartRateandBodyTemp

Wegetaverysmallp-value(0.0036).Anythingasextremeasourobservedslopeof2.44happeningbychanceisveryrare

HeartRateandBodyTemp

• Wecanalsoapproximatea95%confidenceintervalobservedstatistic+ 2SDofstatistic2.44± 2(0.842)=0.76to4.12

• Whatdoesthismean?We’re95%confidentthat,inthepopulationofhealthyadults,each1° increaseinbodytempisassociatedwithanincreaseinheartrateofbetween0.76to4.12beatsperminute

HeartRateandBodyTemp

• Thetheory-basedapproachshouldworkwellsincethedistributionhasanicebellshape• Alsocheckthescatterplot

HeartRateandBodyTemp

• Wewillusethet-statistictogetourtheory-basedp-value.• Wewillfindatheory-basedconfidenceintervalfortheslope.• Onp554,thebooknotestheformula

• t= PQRSTURT

�.

• Herethetstatisticis2.97.• Thep-valueis.36%.Sothecorrelationisstatisticallysignificantlygreaterthanzero.

SmokingandDrinkingExample10.5B

ValidityConditionsRememberourvalidityconditionsfortheory-basedinferenceforslopeoftheregressionequation.

1. Thescatterplotshouldfollowalineartrend.2. Thereshouldbeapproximatelythesamenumber

ofpointsaboveandbelowtheregressionline(symmetry).

3. Thevariabilityofverticalslicesofthepointsshouldbesimilar.Thisiscalledhomoskedasticity.

ValidityConditions• Let’slookatsomescatterplotsthatdonotmeettherequirements.

SmokingandDrinkingTherelationshipbetweennumberofdrinksandcigarettesperweekforarandomsampleofstudentsatHopeCollege.

Thedotat(0,0)represents524students

Aretheconditionsmet?Hardtosay.Thebooksaysno.

SmokingandDrinking• Whentheconditionsarenotmet,applyingsimulation-basedinferenceispreferabletotheory-basedt-testsandCIs.

ValidityConditions

• Whatdoyoudowhenvalidityconditionsaren’tmetfortheory-basedinference?• Usethesimulated-basedapproach.

• Anotherstrategyisto“transform”thedataonadifferentscalesoconditionsaremet.• Thelogarithmicscaleiscommon.

• Onecanalsofitadifferentcurve,notnecessarilyaline.

top related