simple linear regression & correlation instructor: prof. wei zhu 11/21/2013 ams 572 group...
TRANSCRIPT
![Page 1: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/1.jpg)
Simple Linear Regression & Correlation
Instructor: Prof. Wei Zhu
11/21/2013
AMS 572 Group Project
![Page 2: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/2.jpg)
1. Motivation & Introduction – Lizhou Nie2. A Probabilistic Model for Simple Linear Regression –
Long Wang3. Fitting the Simple Linear Regression Model – Zexi Han4. Statistical Inference for Simple Linear Regression –
Lichao Su5. Regression Diagnostics – Jue Huang6. Correlation Analysis – Ting Sun7. Implementation in SAS – Qianyi Chen8. Application and Summary – Jie Shuai
Outline
![Page 3: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/3.jpg)
1. Motivation
http://popperfont.net/2012/11/13/the-ultimate-solar-system-animated-gif/
Fig. 1.1 Simplified Model for Solar System
Fig. 1.2 Obama & Romney during Presidential Election Campaign
http://outfront.blogs.cnn.com/2012/08/14/the-most-negative-in-campaign-history/
![Page 4: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/4.jpg)
• Regression AnalysisLinear Regression:Simple Linear Regression: {y, x}
Multiple Linear Regression: {y; x1, … , xp}
Multivariate Linear Regression: {y1, … , yn; x1, … , xp}
• Correlation AnalysisPearson Product-Moment Correlation
Coefficient: Measurement of Linear Relationship between Two Variables
Introduction
![Page 5: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/5.jpg)
• George Udny Yule & Karl Pearson Extention to a
More Generalized Statistical Context
• Carl Friedrich Gauss Further Development of
Least Square Theory including Gauss-Markov Theorem
•Adrien-Marie LegendreEarliest Form of
Regression: Least
Square Method
History
• Sir Francis Galton Coining the Term
“Regression”
http://en.wikipedia.org/wiki/Regression_analysishttp://en.wikipedia.org/wiki/Adrien_Marie_Legendrehttp://en.wikipedia.org/wiki/Carl_Friedrich_Gausshttp://en.wikipedia.org/wiki/Francis_Galtonhttp://www.york.ac.uk/depts/maths/histstat/people/yule.gifhttp://en.wikipedia.org/wiki/Karl_Pearson
![Page 6: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/6.jpg)
Simple Linear Regression
- Special Case of Linear Regression
- One Response Variable to One Explanatory Variable
General Setting
- We Denote Explanatory Variable as Xi’s and Response Variable as Yi’s
- N Pairs of Observations {xi, yi}, i = 1 to n
2. A Probabilistic Model
![Page 7: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/7.jpg)
Sketch the Graph
2. A Probabilistic Model
(29, 5.5)
X Y1 37.70 9.822 16.31 5.003 28.37 9.274 -12.13 2.98
98 9.06 7.3499 28.54 10.37
100 -17.19 2.33
![Page 8: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/8.jpg)
In Simple Linear Regression, Data is described as:
Where ~ N(0, )
The Fitted Model:
Where - Intercept
- Slope of Regression Line
2. A Probabilistic Model
![Page 9: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/9.jpg)
3. Fitting the Simple Linear Regression Model
0 5 10 15 20 25 30 350.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
450.00
Mileage (in 1000 miles)
Groo
ve d
epth
(in
mils
)
Milage(in 1000 miles) Groove Depth (in mils)
0 394.33
4 329.50
8 291.00
12 255.17
16 229.33
20 204.83
24 179.00
28 163.83
32 150.33
Fig 3.1. Scatter plot of tire tread wear vs. mileage. From: Statistics and Data Analysis; Tamhane and Dunlop; Prentice Hall.
Table 3.1.
![Page 10: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/10.jpg)
The difference between the fitted line and real data is
2
110
n
iii xyQ
Our goal: minimize the sum of square
3. Fitting the Simple Linear Regression Model
0 5 10 15 20 25 30 350.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
450.00
Mileage (in 1000 miles)
Groo
ve d
epth
(in
mils
)
Fig 3.2.
ie
ie is the vertical distance between fitted line and the real data
i i ie y y
![Page 11: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/11.jpg)
3. Fitting the Simple Linear Regression Model
0 11 1
20 1
1 1 1
n n
i ii i
n n n
i i i ii i i
n x y
x x x y
Least Square Method
![Page 12: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/12.jpg)
2
1 1 1 10
2 2
1 1
1 1 11
2 2
1 1
( )( ) ( )( )
( )
( )( )
( )
n n n n
i i i i ii i i i
n n
i ii i
n n n
i i i ii i i
n n
i ii i
x y x x y
n x x
n x y x y
n x x
3. Fitting the Simple Linear Regression Model
![Page 13: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/13.jpg)
1 1 1 1
2 2 2
1 1 1
2 2 2
1 1 1
1( )( ) ( )( )
1( ) ( )
1( ) ( )
n n n n
xy i i i i i ii i i i
n n n
xx i i ii i i
n n n
yy i i ii i i
S x x y y x y x yn
S x x x xn
S y y y yn
To simplify, we denote:
3. Fitting the Simple Linear Regression Model
![Page 14: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/14.jpg)
Back to the example:
3. Fitting the Simple Linear Regression Model
![Page 15: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/15.jpg)
Therefore, the equation of fitted line is:
Not enough!
3. Fitting the Simple Linear Regression Model
![Page 16: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/16.jpg)
We define:
Prove:
The ratio:
is called the coefficient of determination
3. Fitting the Simple Linear Regression Model Check the goodness of fit of LS line
SST: total sum of squaresSSR: Regression sum of squaresSSE: Error sum of squares
![Page 17: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/17.jpg)
Back to the example:
3. Fitting the Simple Linear Regression Model Check the goodness of fit of LS line
where the sign of r follows from the sign of since 95.3% of the variation in tread wear is accounted for by linear regression on mileage, the relationship between the two is strongly linear with a negative slope.
![Page 18: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/18.jpg)
r is the sample correlation coefficient between X and Y:
For the simple linear regression,
3. Fitting the Simple Linear Regression Model
![Page 19: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/19.jpg)
Estimation of
The variance measures the scatter of the
around their means
An unbiased estimate of is given by
2
2
2
2 1
2 2
n
ii
eSSE
sn n
2
3. Fitting the Simple Linear Regression Model
![Page 20: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/20.jpg)
From the example, we have SSE=2351.3 and n-2=7,therefore
Which has 7 d.f. The estimate of is
3. Fitting the Simple Linear Regression Model
![Page 21: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/21.jpg)
4. Statistical Inference For SLR
![Page 22: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/22.jpg)
Under the normal error assumption
* Point estimators:
* Sampling distributions of and :
1
00 1,
2
0( ) i
xx
xSE s
nS
1( )xx
sSE
S
xx
i
nS
xN
22
00 ,~ˆ
xxS
N2
11 ,~ˆ
22
00 )ˆ( E
11)ˆ( E
![Page 23: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/23.jpg)
Derivation
xxxx
xx
n
ii
xx
n
i xx
i
n
ii
xx
i
SS
S
xxS
S
xx
YVarS
xxVar
2
2
2
1
22
2
1
2
2
1
2
1
)(
)()ˆ(
11
21
11
1
11
10
1
10
11
)(
)()(
)()(
)()(
)()()ˆ(
n
ii
xx
n
ii
n
iii
xx
n
i xx
iin
i xx
i
n
i xx
ii
n
i xx
ii
xxS
xxxxxxS
S
xxx
S
xx
S
xExx
S
YExxE
![Page 24: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/24.jpg)
0
110
110
1
10
)(
)ˆ()(
)ˆ()ˆ(
xn
xn
xn
xE
xEn
YE
xYEE
i
i
i
xx
i
xx
ii
xx
nS
x
nS
xnxxx
S
x
n
VarxYVar
xYVarVar
22
22
222
12
10
)(
)ˆ()(
)ˆ()ˆ(
Derivation
For mathematical derivations, please refer to the Tamhane and Dunlop text book, P331.
![Page 25: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/25.jpg)
* Pivotal Quantities (P.Q.’s):
* Confidence Intervals (C.I.’s):
0 0 1 12, 2,2 2
,n nt SE t SE
2
0
00~)ˆ(
ˆ
nt
SE
2
1
11~)ˆ(
ˆ
nt
SE
25
Statistical Inference on β0 and β1
![Page 26: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/26.jpg)
A useful application is to show whether there is a linear relationship between x and y
26/69
Hypothesis tests:
. 011a
0110 :.: HvsH
Reject at level if
0H 2/,2
1
011
0)ˆ(
ˆ
ntSE
t
0:.0: 110 aHvsH
Reject at level if
0H 2/,2
1
1
0)ˆ(
ˆ
nt
SEt
![Page 27: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/27.jpg)
Mean Square: A sum of squares divided by its degrees of freedom.
27/69
Analysis of Variance (ANOVA)
2and
1 n
SSEMSE
SSRMSR
020
2
1
1
2
12
21
2 )ˆ(
ˆ
/
ˆˆFt
SESss
S
s
SSR
MSE
MSR
xx
xx
22/,2,2,1 nn tf
![Page 28: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/28.jpg)
Analysis of Variance (ANOVA)
ANOVA Table
Source of Variation (Source)
Sum of Squares (SS)
Degrees of Freedom (d.f.)
Mean Square (MS)
F
Regression
Error
SSR SSE
1 n - 2
Total SST n - 1
SSRMSR=
1SSE
MSE=2n
MSRF=MSE
28
![Page 29: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/29.jpg)
5.1 Checking the Model Assumptions
5.1.1 Checking for Linearity5.1.2 Checking for Constant Variance5.1.3 Checking for Normality
Primary tool: residual plots
5.2 Checking for Outliers and Influential Observations
5.2.1 Checking for Outliers5.2.2 Checking for Influential Observations5.2.3 How to Deal with Outliers and Influential Observations
5. Regression Diagnostics
![Page 30: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/30.jpg)
5.1 Checking the Model Assumptions
5.1.1 Checking for Linearity5.1.2 Checking for Constant Variance5.1.3 Checking for Normality
Primary tool: residual plots
5.2 Checking for Outliers and Influential Observations
5.2.1 Checking for Outliers5.2.2 Checking for Influential Observations5.2.3 How to Deal with Outliers and Influential Observations
5. Regression Diagnostics
![Page 31: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/31.jpg)
5.1.1 Checking for Linearity
i
1 0 394.33 360.64 33.69
2 4 329.50 331.51 -2.01
3 8 291.00 302.39 -11.39
4 12 255.17 273.27 -18.10
5 16 229.33 244.15 -14.82
6 20 204.83 215.02 -10.19
7 24 179.00 185.90 -6.90
8 28 163.83 156.78 7.05
9 32 150.33 127.66 22.67
Table 5.1 The, , , for the Tire Wear Data
Figure 5.1 S, , for the Tire Wear Data
5. Regression Diagnostics
![Page 32: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/32.jpg)
5.1.1 Checking for Linearity (Data transformation)
x yx2 yx3 yx logyx 1/y
x ylogx y-1/x y2
x y3
x y
x ylogx y-1/x y
x logyx -1/y
x yx2 yx3 yx y2
x y3
Figure 5.2 Typical Scatter Plot Shapes and Corresponding Linearizing Transformations
5. Regression Diagnostics
![Page 33: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/33.jpg)
5.1.1 Checking for Linearity (Data transformation)
i
1 0 394.33 5.926 374.64 19.69
2 4 329.50 5.807 332.58 - 3.08
3 8 291.00 5.688 295.24 - 4.24
4 12 255.17 5.569 262.09 - 6.92
5 16 229.33 5.450 232.67 - 3.34
6 20 204.83 5.331 206.54 - 1.71
7 24 179.00 5.211 183.36 - 4.36
8 28 163.83 5.092 162.77 1.06
9 32 150.33 4.973 144.50 5.83
Table 5.2 The, , , , for the Tire Wear Data Figure 5.2 S,, for the Tire Wear Data
5. Regression Diagnostics
![Page 34: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/34.jpg)
5.1 Checking the Model Assumptions
5.1.1 Checking for Linearity5.1.2 Checking for Constant Variance5.1.3 Checking for Normality
Primary tool: residual plots
5.2 Checking for Outliers and Influential Observations
5.2.1 Checking for Outliers5.2.2 Checking for Influential Observations5.2.3 How to Deal with Outliers and Influential Observations
5. Regression Diagnostics
![Page 35: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/35.jpg)
5.1.2 Checking for Constant Variance
Plot the residuals against the fitted value If the constant variance assumption is correct, the dispersion of the ’s is approximately constant with respect to the ’s.
Figure 5.4 Plots of ResidualsFigure 5.3 Plots of Residuals
5. Regression Diagnostics
![Page 36: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/36.jpg)
5.1 Checking the Model Assumptions
5.1.1 Checking for Linearity5.1.2 Checking for Constant Variance5.1.3 Checking for Normality
Primary tool: residual plots
5.2 Checking for Outliers and Influential Observations
5.2.1 Checking for Outliers5.2.2 Checking for Influential Observations5.2.3 How to Deal with Outliers and Influential Observations
5. Regression Diagnostics
![Page 37: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/37.jpg)
5.1.3 Checking for normality
Make a normal plot of the residuals They have a zero mean and an approximately constant variance.
(assuming the other assumptions about the model are correct)
Figure 5.5 N
5. Regression Diagnostics
![Page 38: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/38.jpg)
5.1 Checking the Model Assumptions
5.1.1 Checking for Linearity5.1.2 Checking for Constant Variance5.1.3 Checking for Normality
Primary tool: residual plots
5.2 Checking for Outliers and Influential Observations
5.2.1 Checking for Outliers5.2.2 Checking for Influential Observations5.2.3 How to Deal with Outliers and Influential Observations
5. Regression Diagnostics
![Page 39: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/39.jpg)
Outlier:
an observation that does not follow the general pattern of the relationship between y and x. A large residual indicates an outlier.
Standardized residuals are given by
If , then the corresponding observation may be regarded as an outlier.
*
2, 1, 2,..., .
( ) ( )11
i i ii
i i
xx
e e ee i n
SE e sx xs
n S
* 2ie
Influential Observation:
an influential observation has an extreme x-value, an extreme y-value, or both.
If we express the fitted value of y as a linear combination of all the
If , then the corresponding observations may be regarded as influential observation.
1
ˆn
i ij jj
y h y
2( )1 i
iixx
x xh
n S
2( 1) /iih k n
5. Regression Diagnostics
𝑦 𝑗
![Page 40: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/40.jpg)
5.2 Checking for Outliers and Influential Observations
1 2.8653
2 -0.4113
3 -0.5367
4 -0.8505
5 -0.4067
6 -0.2102
7 -0.5519
8 0.1416
9 0.8484
*iei
1 0.3778
2 0.2611
3 0.1778
4 0.1278
5 0.1111
6 0.1278
7 0.1778
8 0.2611
9 0.3778
iihi
Table 5.3 Standard residuals & leverage for transformed data
0.44iih * 2ie
5. Regression Diagnostics
![Page 41: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/41.jpg)
clear;clc;x = [0 4 8 12 16 20 24 28 32];y = [394.33 329.50 291.00 255.17 229.33 204.83 179.00 163.83 150.33];y1 = log(y); %data transformationp = polyfit(x,y,1) %linear regression predicts y from x% p = polyfit(x,log(y),1)yfit = polyval(p,x) %use p to predict yyresid = y - yfit %compute the residuals%yresid = y1 - exp(yfit) %residual for transformed datassresid = sum(yresid.^2); %residual sum of squaressstotal = (length(y)-1) * var(y); %sstotalrsq = 1 - ssresid/sstotal; %R square normplot(yresid) %normal plot for residuals[h,p,jbstat,critval]=jbtest(yresid) %test normalityscatter(x,y,500,'r','.') %generate the scatter plotslslinelaxis([-5,35,-10,25])xlabel('x_i')ylabel('y_i')Title('plot of ...')for i = 1:length(x) % check for outliers p(i) = yresid(i)/std(yresid)/sqrt(1-1/length(x)-(yresid(i)-mean(yresid)^2)/(yresid(i)-mean(yresid))^2)end%check for influential observationsfor j = 1:length(x) q(i) = 1/length(x)+(x(i)-mean(x))^2/960end
MATLAB Code for Regression Diagnostics
![Page 42: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/42.jpg)
Why we need this? Regression analysis is used to model the
relationship between two variables.
But when there is no such distinction and both variables are random, correlation analysis is used to study the strength of the relationship.
6.1 Correlation Analysis
![Page 43: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/43.jpg)
6.1 Correlation Analysis- Example
Flu reportedLife expectancy
Economy level People who get flu shot
Temperature
Economic growth
Figure 6.1
Example
![Page 44: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/44.jpg)
Because we need to investigate the correlation between X,Y
Source:http://wiki.stat.ucla.edu/socr/index.php/File:SOCR_BivariateNormal_JS_Activity_Fig7.png
6.2 Bivariate Normal Distribution
Figure 6.2
![Page 45: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/45.jpg)
6.2 Why introduce Bivariate Normal Distribution?
First, we need to do some computation.
Compare with:
So, if (X,Y) have a bivariate normal distribution, then the regression model is true
![Page 46: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/46.jpg)
Define the r.v. R corresponding to r
But the distribution of R is quite complicated
6.3 Statistical Inference of r
Figure 6.3
r r r r
f(r) f(r) f(r) f(r)
-0.7 -0.3 0.50
![Page 47: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/47.jpg)
Test: H0 : ρ=0 , Ha : ρ≠0
Test statistic:
Reject H0 iff
ExampleA researcher wants to determine if two test instruments give similar results. The two test instruments are administered to a sample of 15 students. The correlation coefficient between the two sets of scores is found to be 0.7. Is this correlation statistically significant at the .01 level?
H0 : ρ=0 , Ha : ρ≠0
3.534 = t0 > t13, .005 = 3.012
So, we reject H0
6.3 Exact test when ρ=0
0 2
2
1
r nT
r
0 2, / 2nt t 534.3
7.01
2157.020
t
![Page 48: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/48.jpg)
6.3 Note:They are the same!
Because
So
We can say
H0: β1=0 are equivalent to H0: ρ=0
![Page 49: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/49.jpg)
Because that the exact distribution of R is not very useful for making inference on ρ,
R.A Fisher showed that we can do the following linear transformation, to let it be approximate normal distribution.
That is,
6.3 Approximate test when ρ≠0
3
1,
1
1ln2
1
1
1ln2
1tanh 1
nN
R
RR
![Page 50: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/50.jpg)
1,H0 : ρ= ρ0 vs. H1 : ρ ≠ ρ0
2, point estimator
3, T.S.
4, C.I
6.3 Steps to do the approximate test on ρ
![Page 51: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/51.jpg)
Lurking Variable Over extrapolation
6.4 The pitfalls of correlation analysis
![Page 52: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/52.jpg)
7. Implementation in SAS
state district democA voteA expendA
expendB
prtystrA lexpendA
lespendB
shareA
1"AL" 7 1 68 328.3 8.74 41 5.793916 2.167567 97.41
2"AK" 1 0 62 626.38 402.48 60 6.439952 5.997638 60.88
3"AZ" 2 1 73 99.61 3.07 55 4.601233 1.120048 97.01
…
173"WI" 8 1 30 14.42 227.82 47 2.668685 5.428569 5.95
Table7.1 vote example data
![Page 53: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/53.jpg)
SAS code of the vote example
proc corr data=vote1; var F4 F10; run;
Pearson Correlation Coefficients, N = 173 Prob > |r| under H0: Rho=0
F4 F10
F4 1.00000 0.92528
Table7.2 correlation coeffients
7. Implementation in SAS
proc reg data=vote1; model F4=F10; label F4=voteA; label F10=shareA;output out=fitvote residual=R; run;
![Page 54: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/54.jpg)
SAS output
Analysis of Variance
Source DF Sum of Squares Mean Square F Value Pr > F
Model 1 41486 41486 1017.70 <.0001
Error 171 6970.77364 40.76476
Corrected Total 172 48457
Root MSE 6.38473 R-Square 0.8561
Dependent Mean 50.50289 Adj R-Sq 0.8553
Coeff Var 12.64230
Parameter Estimates
Variable Label DF Parameter Estimate Standard Error t Value
Intercept Intercept 1 26.81254 0.88719 30.22
F10 F10 1 0.46382 0.01454 31.90
Table7.3 SAS output for vote example
7. Implementation in SAS
![Page 55: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/55.jpg)
Figure7.1 Plot of Residual vs. ShareA for vote example
7. Implementation in SAS
![Page 56: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/56.jpg)
Figure7.2 Plot of voteA vs. shareA for vote example
7. Implementation in SAS
![Page 57: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/57.jpg)
SAS-Check Homoscedasticity
Figure7.3 Plots of SAS output for vote example
7. Implementation in SAS
![Page 58: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/58.jpg)
SAS-Check Normality of Residuals
SAS code:
Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 0 Pr > |t| 1.0000
Sign M -0.5 Pr >= |M| 1.0000
Signed Rank S -170.5 Pr >= |S| 0.7969
Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 0.952811 Pr < W 0.7395
Kolmogorov-Smirnov
D 0.209773 Pr > D >0.1500
Cramer-von Mises W-Sq 0.056218 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.30325 Pr > A-Sq >0.2500
proc univariate data=fitvote normal;var R;qqplot R / normal (Mu=est Sigma=est);run;
Table7.4 SAS output for checking normality
7. Implementation in SAS
![Page 59: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/59.jpg)
SAS-Check Normality of Residuals
Figure7.4 Plot of Residual vs. Normal Quantiles for vote example
7. Implementation in SAS
![Page 60: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/60.jpg)
• Linear regression is widely used to describe possible relationships between variables. It ranks as one of the most important tools in these disciplines.Marketing/business analyticsHealthcareFinanceEconomicsEcology/environmental science
8. Application
![Page 61: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/61.jpg)
• Prediction, forecasting or deductionLinear regression can be used to fit a
predictive model to an observed data set of Y and X values. After developing such a model, if an additional value of X is then given without its accompanying value of Y, the fitted model can be used to make a prediction of the value of Y.
8. Application
![Page 62: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/62.jpg)
• Quantifying the strength of relationshipGiven a variable y and a number of
variables X1, ..., Xp that may be related to Y, linear regression analysis can be applied to assess which Xj may have no relationship with Y at all, and to identify which subsets of the Xj contain redundant information about Y.
8. Application
![Page 63: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/63.jpg)
Example 1. Trend line
8. Application
A trend line represents a trend, the long-term movement in time series data after other components have been accounted for. Trend lines are sometimes used in business analytics to show changes in data over time.
Figure 8.1 Refrigerator sales over a 13-year period
http://www.likeoffice.com/28057/Excel-2007-Formatting-charts
![Page 64: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/64.jpg)
Example 2. Clinical drug trials
8. Application
Regression analysis is widely utilized in healthcare. The graph shows an example in which we investigate the relationship between protein concentration and absorbance employing linear regression analysis. Figure 8.2 BSA Protein Concentration Vs.
Absorbance
http://openwetware.org/wiki/User:Laura_Flynn/Notebook/Experimental_Biological_Chemistry/2011/09/13
![Page 65: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/65.jpg)
Summary
Model Assumptions
Outliers &Influential
Observations
Linearity, Constant Variance &Normality
Data Transformation
Probabilistic Models
Least Square Estimate
Linear RegressionAnalysis
StatisticalInference
CorrelationAnalysis
CorrelationCoefficient
(Bivariate NormalDistribution, Exact T-test, Approximate
Z-test.
![Page 66: Simple Linear Regression & Correlation Instructor: Prof. Wei Zhu 11/21/2013 AMS 572 Group Project](https://reader035.vdocuments.net/reader035/viewer/2022062712/56649c7c5503460f94930f50/html5/thumbnails/66.jpg)
Acknowledgement• Sincere thanks go to Prof. Wei Zhu
References• Statistics and Data Analysis, Ajit Tamhane & Dorothy
Dunlop.• Introductory Econometrics: A Modern Approach, Jeffrey
M. Wooldridge,5th ed.• http://en.wikipedia.org/wiki/Regression_analysis
http://en.wikipedia.org/wiki/Adrien_Marie_Legendre
etc. (web links have already been included in the slides)
Acknowledgement & References