1. common problems with regression. a. inferring causation ...frederic/13/f16/day15.pdfthat there is...

33
Stat 13, Intro. to Statistical Methods for the Life and Health Sciences. 1. Common problems with regression. a. Inferring causation. b. Extrapolation. c. Curvature. 2. Testing significance of correlation or slope. No class Thu Nov 24, Thanksgiving. Read ch9. Hw4 is 10.1.8, 10.3.14, 10.3.21, 10.4.11 and is due Tue Nov 29. The final Fri Dec 9, 8am-11, right here, will be on ch1-10. Bring a PENCIL and CALCULATOR and any books or notes you want. No computers. http://www.stat.ucla.edu/~frederic/13/F16 . 1

Upload: others

Post on 22-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

1.Commonproblemswithregression.a.Inferringcausation.b.Extrapolation.c.Curvature.

2.Testingsignificanceofcorrelationorslope.

NoclassThuNov24,Thanksgiving.Readch9.Hw4is10.1.8,10.3.14,10.3.21,10.4.11andisdueTueNov29.

ThefinalFri Dec9,8am-11,right here,willbeonch1-10.BringaPENCILandCALCULATORandanybooksornotesyouwant.Nocomputers.http://www.stat.ucla.edu/~frederic/13/F16.

1

Page 2: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Commonproblemswithregression.

• a.Correlationisnotcausation.Especiallywithobservationaldata.

Page 3: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Commonproblemswithregression.

Page 4: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Commonproblemswithregression.Holmes and Willett (2004) reviewed all prospective studies on fat consumption and breast cancer with at least 200 cases of breast cancer. "Not one study reported a significant positive association with total fat intake.... Overall, no association was observed between intake of total, saturated, monounsaturated, or polyunsaturated fat and risk for breast cancer."

They also state "The dietary fat hypothesis is largely based on the observation that national per capita fat consumption is highly correlated with breast cancer mortality rates. However, per capita fat consumption is highly correlated with economic development. Also, low parity and late age at first birth, greater body fat, and lower levels of physical activity are more prevalent in Western countries, and would be expected to confound the association with dietary fat."

Page 5: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Commonproblemswithregression.• b.Extrapolation.

Page 6: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Commonproblemswithregression.• b.Extrapolation.• Oftenresearchersextrapolatefromhighdosestolow.

Page 7: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Commonproblemswithregression.• b.Extrapolation.Therelationshipcanbenonlinearthough.Researchersalsooftenextrapolatefromanimalstohumans.

Zaichkina etal.(2004)onhamsters

Page 8: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Commonproblemswithregression.• c.Curvature.Thebestfittinglinemightfitpoorly.Portetal.(2005).

Page 9: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Commonproblemswithregression.• c.Curvature.Thebestfittinglinemightfitpoorly.Wongetal.(2011).

Page 10: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Howwelldoesthelinefit?• 𝑟" isameasureoffit.Itindicatestheamountofscatteraroundthebestfittingline.• Residualplotscanindicatecurvature,outliers,orheteroskedasticity.

• 1 − 𝑟"� 𝑠(isusefulasameasureofhowfaroffpredictionswouldhavebeenonaverage.

Notethatregressionresidualshavemeanzero,whetherthelinefitswellorpoorly.

Page 11: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Commonproblemswithregression.• d.Statisticalsignificance.Couldtheobservedcorrelationjustbeduetochancealone?

Page 12: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

InferencefortheRegressionSlope:Theory-BasedApproachSection10.5

Page 13: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Dostudentswhospendmoretimeinnon-academicactivitiestendtohavelowerGPAs?Example10.4

Page 14: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Dostudentswhospendmoretimeinnon-academicactivitiestendtohavelowerGPAs?

• Thesubjectswere34undergraduatestudentsfromtheUniversityofMinnesota.• Theywereaskedquestionsabouthowmuchtimetheyspentinactivitieslikework,watchingTV,exercising,non-academiccomputeruse,etc.aswellaswhattheircurrentGPAwas.• WearegoingtotesttoseeifthereisanegativeassociationbetweenthenumberofhoursperweekspentonnonacademicactivitiesandGPA.

Page 15: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Hypotheses• NullHypothesis:ThereisnoassociationbetweenthenumberofhoursstudentsspendonnonacademicactivitiesandstudentGPAinthepopulation.

• AlternativeHypothesis:ThereisanegativeassociationbetweenthenumberofhoursstudentsspendonnonacademicactivitiesandstudentGPAinthepopulation.

Page 16: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

DescriptiveStatistics• GPAA = 3.60 − 0.0059(nonacademichours).• Whatdotheslopeandy-interceptmean?

Page 17: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

ShuffletoDevelopNullDistribution

• Wearegoingtoshufflejustaswedidwithcorrelationtodevelopanulldistribution.• Theonlydifferenceisthatwewillbecalculatingtheslopeeachtimeandusingthatasourstatistic.• atestofassociationbasedonslopeisequivalenttoatestofassociationbasedonacorrelationcoefficient.

Page 18: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Betavs Rho• Testingtheslopeoftheregressionlineisequivalenttotestingthecorrelation (samep-value,butobviouslydifferentconfidenceintervalssincethestatisticsaredifferent)• Hencethesehypothesesareequivalent.• Ho:β =0Ha:β <0(Slope)• Ho:ρ =0Ha:ρ < 0(Correlation)

• Sampleslope(b)Population(β:beta)• Samplecorrelation(r)Population(ρ:rho)

• Whenwedothetheorybasedtest,wewillbeusingthet-statisticwhichcanbecalculatedfromeithertheslopeorcorrelation.

Page 19: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

Introduction• Ournulldistributionsareagainbell-shapedandcenteredat0(foreithercorrelationorslopeasourstatistic).

The book on p549 finds a p value of 3.3% by simulation.

Page 20: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

ValidityConditions• Undercertainconditions,theory-basedinferenceforcorrelationorslopeoftheregressionlineuset-distributions.• Wecouldusesimulationsorthetheory-basedmethodsfortheslopeoftheregressionline.• Wewouldgetthesamep-valueifweusedcorrelationasourstatistic.

Page 21: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

PredictingHeartRatefromBodyTemperatureExample10.5A

Page 22: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

HeartRateandBodyTemp• Earlierwelookedattherelationshipbetweenheartrateandbodytemperaturewith130healthyadults• PredictedHeartRate = −166.3 + 2.44 Temp• r=0.257

Page 23: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

HeartRateandBodyTemp

• Wetestedtoseeifwehadconvincingevidencethatthereisapositiveassociationbetweenheartrateandbodytemperatureinthepopulationusingasimulation-basedapproach.(Wewillmakeit2-sidedthistime.)• NullHypothesis:Thereisnoassociationbetweenheartrateandbodytemperatureinthepopulation. β=0• AlternativeHypothesis:Thereisanassociationbetweenheartrateandbodytemperatureinthepopulation. β≠0

Page 24: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

HeartRateandBodyTemp

Wegetaverysmallp-value(0.0036).Anythingasextremeasourobservedslopeof2.44happeningbychanceisveryrare

Page 25: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

HeartRateandBodyTemp

• Wecanalsoapproximatea95%confidenceintervalobservedstatistic+ 2SDofstatistic2.44± 2(0.842)=0.76to4.12

• Whatdoesthismean?We’re95%confidentthat,inthepopulationofhealthyadults,each1° increaseinbodytempisassociatedwithanincreaseinheartrateofbetween0.76to4.12beatsperminute

Page 26: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

HeartRateandBodyTemp

• Thetheory-basedapproachshouldworkwellsincethedistributionhasanicebellshape• Alsocheckthescatterplot

Page 27: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

HeartRateandBodyTemp

• Wewillusethet-statistictogetourtheory-basedp-value.• Wewillfindatheory-basedconfidenceintervalfortheslope.• Onp554,thebooknotestheformula

• t= PQRSTURT

�.

• Herethetstatisticis2.97.• Thep-valueis.36%.Sothecorrelationisstatisticallysignificantlygreaterthanzero.

Page 28: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

SmokingandDrinkingExample10.5B

Page 29: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

ValidityConditionsRememberourvalidityconditionsfortheory-basedinferenceforslopeoftheregressionequation.

1. Thescatterplotshouldfollowalineartrend.2. Thereshouldbeapproximatelythesamenumber

ofpointsaboveandbelowtheregressionline(symmetry).

3. Thevariabilityofverticalslicesofthepointsshouldbesimilar.Thisiscalledhomoskedasticity.

Page 30: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

ValidityConditions• Let’slookatsomescatterplotsthatdonotmeettherequirements.

Page 31: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

SmokingandDrinkingTherelationshipbetweennumberofdrinksandcigarettesperweekforarandomsampleofstudentsatHopeCollege.

Thedotat(0,0)represents524students

Aretheconditionsmet?Hardtosay.Thebooksaysno.

Page 32: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

SmokingandDrinking• Whentheconditionsarenotmet,applyingsimulation-basedinferenceispreferabletotheory-basedt-testsandCIs.

Page 33: 1. Common problems with regression. a. Inferring causation ...frederic/13/F16/day15.pdfthat there is a positive association between heart rate and body temperature in the population

ValidityConditions

• Whatdoyoudowhenvalidityconditionsaren’tmetfortheory-basedinference?• Usethesimulated-basedapproach.

• Anotherstrategyisto“transform”thedataonadifferentscalesoconditionsaremet.• Thelogarithmicscaleiscommon.

• Onecanalsofitadifferentcurve,notnecessarilyaline.