10/13/15
1
Statistics:UnlockingthePowerofData Lock5
STAT250Dr.KariLockMorgan
SimpleLinearRegression
SECTION9.1 • Inferenceforcorrelation• Inferenceforslope• Conditionsforinference
Statistics:UnlockingthePowerofData Lock5
QuestionoftheDay
Isthesizeofcertainregionsofyourbrainassociatedwiththesizeofyoursocialnetwork?
Statistics:UnlockingthePowerofData Lock5
SocialNetworksandtheBrain� Datafrom 40studentsatCityCollegeLondon
�Howtomeasurebrainsize?
�Howtomeasuresocial networksize?
Source:R.Kanai,B.Bahrami,R.Roylance andG.Ree(2011).Onlinesocialnetworksizeisreflectedinhumanbrainstructure,ProceedingsoftheRoyalSocietyB:BiologicalSciences.10/19/11.
Statistics:UnlockingthePowerofData Lock5
MeasuringBrainSize� Structural Magnetic Resonance Imaging (MRI)
� Voxel-based morphometry (VBM) tocomputeregional greymatter volume based onT1-weightedanatomical MRIscans
� Brainregions foundsignificant ininitial study¡ Amygdala (emotion and emotionalmemory)¡Middle temporal gyrus (social perception)¡ Entorhinal cortex(memory and navigation)¡ Superior temporal sulcus (perception of others)
� Response: normalized z-score ofgreymatterdensity forthese brain regions
Statistics:UnlockingthePowerofData Lock5
BrainRegions
ImagefromDoourBrainsDetermineourFacebookFriendCount? (www.nature.com)
Statistics:UnlockingthePowerofData Lock5
SocialNetworksandtheBrain� Howtomeasure sizeofsocialnetwork?
¡ Howmanywerepresentatyour18thor21stbirthdayparty?¡ Ifyouweregoingtohaveapartynow,howmanypeoplewouldyouinvite?
¡ What isthetotalnumberoffriendsinyourphonebook?¡ Write downthenamesofthepeopletowhomyouwouldsendatextmessage markingacelebratoryevent. Howmanypeopleisthat?
¡ Write downthenamesofpeopleinyourphonebookyouwouldmeet forachatinasmallgroup(onetothreepeople).Howmanypeopleisthat?
¡ Howmanyfriendshaveyoukeptfromschoolanduniversitywhomyoucouldhaveafriendlyconversationwithnow?
¡ Howmanyfriendsdoyouhaveon‘Facebook’?¡ Howmanyfriendsdoyouhavefromoutsideschooloruniversity?¡ Write downthenamesofthepeopleofwhomyoufeelyoucouldaskafavorandexpecttohaveitgranted.Howmanypeopleisthat?
10/13/15
2
Statistics:UnlockingthePowerofData Lock5
SocialNetworksandtheBrain
r =0.436
Istheassociationsignificant?
Statistics:UnlockingthePowerofData Lock5
Parameter Distribution Standard Error
Proportion Normal
Difference inProportions
Normal
Mean t,df =n – 1
Difference inMeans t,df = min(n1, n2) – 1
Correlation t,df =n– 2
Standard ErrorFormulas
(1 )p pn−
2
nσ
1 1
1
2 2
2
(1 ) (1 )p p p pn n− −+
2 21 2
1 2n nσ σ+
1− ρ2
n− 2
Statistics:UnlockingthePowerofData Lock5
SocialNetworks andtheBrain• IsthegreymattervolumeoftheseregionsofthebrainsignificantlycorrelatedwithnumberofFacebookfriends?
• Fromn=40people,wefindr =.436.Isthissignificant?
Statistics:UnlockingthePowerofData Lock5
SocialNetworks andtheBrain0 : 0: 0aH
H ρρ=≠
t = r − 0
SE= 0.436
0.156= 2.99
This provides strong evidence that the grey matter density of these regions of the brain and number of Facebook friends are positively correlated.
n = 40 ≥ 30
1.Statehypotheses:
2.Check conditions:
3.Calculate test statistic:
4.Compute p-value:
5. Interpret incontext:
r = 0.436
SE = 1− 0.4362
40− 2= 0.156
p-value=0.0048
Statistics:UnlockingthePowerofData Lock5
SocialNetworksandtheBrain
ShouldyougooutandaddmoreFacebookfriendstoincreasethesizeofyourbrain?
a) Yesb)No
Statistics:UnlockingthePowerofData Lock5
R2
R2 istheproportionofthevariabilityintheresponsevariable,Y,thatis
explainedbytheexplanatoryvariable,X
� Forsimplelinearregression, R2 =r2 (R2 isjustthesamplecorrelation squared)
10/13/15
3
Statistics:UnlockingthePowerofData Lock5
R22 0.67R = 2 0.09R =
HowmuchdoesthevariabilityinYdecreaseifyouknowX?
Statistics:UnlockingthePowerofData Lock5
RegressioninMinitab� Stat->Regression->FittedLinePlot
0.4362 =0.19
Statistics:UnlockingthePowerofData Lock5
Prediction
Shouldyouusethisequationtopredictthenormalizedsizeoftheseregionsofyourbrain?
a) Yesb)No
Statistics:UnlockingthePowerofData Lock5
SampletoPopulation� Everythingwehavedonesofarwithregressionisbasedsolelyonsampledata
�Now,wewillextendfromthesampletothepopulation
� Statistical inference!
Statistics:UnlockingthePowerofData Lock5
• Thepopulation/true simple linearmodel is
𝑦 = 𝛽$ + 𝛽&𝑥 + 𝜀
• β0 and β1,areunknownparameters
• Can usefamiliar inference methods!
Intercept Slope
Simple Linear Model
Randomerror
Statistics:UnlockingthePowerofData Lock5
InferencefortheSlope� Testforwhethertheslopeissignificantlydifferentfrom0(whetherthereisanylinearrelationshipbetweenxandy):
� Confidenceintervalforthetrueslope!!
H0 :β1 =0Ha :β1 ≠0
10/13/15
4
Statistics:UnlockingthePowerofData Lock5
• Confidenceintervalsandhypothesistests fortheslopecanbedoneusingthefamiliarformulas:
• PopulationParameter:β1,SampleStatistic:𝛽)&
• Uset-distribution withn – 2degreesoffreedom
Inference fortheSlope
sample statistic null valueSE
t −=
*sample statistic t SE± ×
Statistics:UnlockingthePowerofData Lock5
RegressioninMinitabStat ->Regression->Regression->FitRegressionModel
Statistics:UnlockingthePowerofData Lock5
Inference forSlope
Istheslopesignificantlydifferentfrom 0?(a)Yes(b)No
n =40
Givea95%confidenceintervalforthetrueslope.
Statistics:UnlockingthePowerofData Lock5
Hypothesis Test
Statistics:UnlockingthePowerofData Lock5
RegressioninMinitabStat ->Regression->Regression->FitRegressionModel
Statistics:UnlockingthePowerofData Lock5
TwoQuantitative Variables• Thet-statistic (andp-value)foratest foranon-zeroslopeandatestforanon-zerocorrelation areidentical!
• Theyareequivalentwaysoftesting foralinearassociation betweentwoquantitativevariables.
10/13/15
5
Statistics:UnlockingthePowerofData Lock5
Confidence Interval
Statistics:UnlockingthePowerofData Lock5
Multiple Testing?
Statistics:UnlockingthePowerofData Lock5
FalsePositive (TypeIError) Protection
� Tofurtherprotect againstTypeIerrors, theyperformedtwoindependentanalysisontwoseparatesamples(n=125,thenn=40)
Statistics:UnlockingthePowerofData Lock5
Real-WorldNetworkSize�Whataboutreal-worldnetwork size?
Statistics:UnlockingthePowerofData Lock5
Inference based onthe simple linearmodel isonlyvalidifthe followingconditions hold:
1) Linearity2) Constant Variability ofResiduals3) Normality ofResiduals4) Independence
Conditions
Statistics:UnlockingthePowerofData Lock5
• Therelationship between x and y islinear (itmakes sense todrawalinethrough thescatterplot)
Linearity
10/13/15
6
Statistics:UnlockingthePowerofData Lock5
DogYears
• 1dogyear=7humanyears• Linear:humanage=7×dogage
Charlie
• Fromwww.dogyears.com:“Theoldrule-of-thumbthatonedogyearequalssevenyearsofahumanlifeisnotaccurate.Theratioishigherwithyouthanddecreasesabitasthedogages.”
LINEAR
ACTUAL
A linearmodelcanstillbeuseful,evenifitdoesn’tperfectlyfitthedata.
Statistics:UnlockingthePowerofData Lock5
“Allmodelsarewrong,butsomeareuseful”
-GeorgeBox
Statistics:UnlockingthePowerofData Lock5
Residuals (errors)
( )~ 0,i N εε σTheerrorsarenormallydistributed
Theaverageoftheerrorsis0
Thestandarddeviationoftheerrorsisconstantforallcases
Conditions forresiduals:
Checkwithahistogram
(Alwaystrueforleastsquaresregression)
Constantspreadofpointsaround
thelineStatistics:UnlockingthePowerofData Lock5
RegressioninMinitabIstheassociationapproximatelylinear?a) Yesb) No
Isthespreadofthepointsaround thelineapproximatelyconstant?a) Yesb) No
Statistics:UnlockingthePowerofData Lock5
HistogramofResidualsAretheresidualsapproximatelynormallydistributed?a) Yesb) No
Statistics:UnlockingthePowerofData Lock5
Non-ConstantVariability
10/13/15
7
Statistics:UnlockingthePowerofData Lock5
Non-NormalResiduals
Statistics:UnlockingthePowerofData Lock5
• Casesmustbeindependentofeachother(onecase’s valuesdoesnotaffect anothercase’svalues)
• Most common violationofthis:dataovertime
• Whatwouldmaketheindependenceconditionsatisfiedorviolatedinthesocial networkandbrainsizedata?
Independence
Statistics:UnlockingthePowerofData Lock5
• Iftheassociation isn’tlinear:don’tusesimplelinearregression
• Ifvariabilityisnotconstant, residualsarenotnormal,orcasesnotindependent:Themodelitselfisstillvalid,butinference maynotbeaccurate
• Ifyouwanttodosomething morefancysotheconditionsaremet…takeSTAT 462!
Conditions notMet?
Statistics:UnlockingthePowerofData Lock5
1) Plotyourdata!• Associationapproximatelylinear?• Outliers?• Constantvariability?
2) Fitthemodel(least squares)3) Usethemodel • Interpretcoefficients• Makepredictions
4) Lookathistogramofresiduals(normal?)5) Inference(extend topopulation)• Inferenceonslope (intervalandtest)
Simple Linear Regression
Statistics:UnlockingthePowerofData Lock5
ToDo� ReadSection9.1
� DoHW9.1(dueFriday,12/4)