introduction to linear regression notes/regression.pdf · 2014-09-25 · predicting height from...
TRANSCRIPT
Introduction to Linear Regression
James H. Steiger
Department of Psychology and Human DevelopmentVanderbilt University
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46
Introduction to Linear Regression1 Introduction
2 Fitting a Straight Line
Introduction
Characteristics of a Straight Line
Regression Notation
The Least Squares Solution
3 Predicting Height from Shoe Size
Creating a Fit Object
Examining Summary Statistics
Drawing the Regression Line
Using the Regression Line
4 Partial Correlation
An Example
5 Review Questions
Question 1
Question 2
Question 3
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 2 / 46
Introduction
Introduction
In this module, we discuss an extremely important technique in statistics— Linear Regression.
Linear regression is very closely related to correlation, and is extremelyuseful in a wide range of areas.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 3 / 46
Introduction
Introduction
We begin by loading some data relating height to shoe size and drawingthe scatterplot for the male data.
> all.heights <- read.csv("shoesize.csv")
> male.data <- all.heights[all.heights$Gender == "M", ] #Select males
> attach(male.data) #Make Variables Available
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 4 / 46
Introduction
Introduction
Next, we draw the scatterplot. The points align themselves in a linearpattern.> # Draw scatterplot
> plot(Size, Height, xlab = "Shoe Size", ylab = "Height in Inches")
8 10 12 14
6570
7580
Shoe Size
Hei
ght i
n In
ches
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 5 / 46
Introduction
Introduction
> cor(Size, Height)
[1] 0.7677
The correlation is an impressive 0.77. But how can we characterize therelationship between shoe size and height?
In this case, linear regression is going to prove very useful.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 6 / 46
Fitting a Straight Line Introduction
Fitting a Straight LineIntroduction
If data are scattered around a straight line, then the relationship betweenthe two variables can be thought of as being represented by that straightline, with some “noise” or error thrown in.
We know that the correlation coefficient is a measure of how well thepoints will fit a straight line. But which straight line is best?
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 7 / 46
Fitting a Straight Line Introduction
Fitting a Straight LineIntroduction
The key to understanding this is to realize the following:
1 Any straight line can be characterized by just two parameters, a slopeand an intercept, and the equation for the straight line isY = b0 + b1Xa, where b1 is the slope and b0 is the intercept.
2 Any point can be characterized relative to a particular line in terms oftwo quantities: (a) where its X falls on a line, and (b) how far its Yis from the line in the vertical direction.
Let’s examine each of these preceding points.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 8 / 46
Fitting a Straight Line Characteristics of a Straight Line
Fitting a Straight LineCharacteristics of a Straight Line
The slope multiplies X , and so any change in X is multiplied by the slopeand passed on to Y . Consequently, the slope represents “the rise over therun,” the amount by which Y increases for each unit increase in X .
The intercept is, of course, the value of Y when X = 0.
So if you have the slope and intercept, you have the line.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 9 / 46
Fitting a Straight Line Characteristics of a Straight Line
Fitting a Straight LineCharacteristics of a Straight Line
Suppose we draw a line — any line — in a plane.
Then consider a point — any point — with respect to that line.
What can we say? Let’s use a concrete example.
Suppose I draw the straight line whose equation is Y = 1.04X + 0.2 in aplane, and then plot the point (2, 3) by going over to 2 on the X -axis,then up to 3 on the Y -axis.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 10 / 46
Fitting a Straight Line Characteristics of a Straight Line
Fitting a Straight LineCharacteristics of a Straight Line
0 1 2 3 4 5
01
23
45
X
Y
(2,3)
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 11 / 46
Fitting a Straight Line Characteristics of a Straight Line
Fitting a Straight LineCharacteristics of a Straight Line
Now suppose I were to try to use the straight line to predict the Y valueof the point only from a knowledge of the X value of that point.
The X value of the point is 2. If I substitute 2 for X in the formulaY = 1.04X + 0.2, I get Y = 2.28.
This value lies on the line, directly above X . I’ll draw that point on thescatterplot in blue.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 12 / 46
Fitting a Straight Line Characteristics of a Straight Line
Fitting a Straight LineCharacteristics of a Straight Line
0 1 2 3 4 5
01
23
45
X
Y
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 13 / 46
Fitting a Straight Line Characteristics of a Straight Line
Fitting a Straight LineCharacteristics of a Straight Line
The Y value for the blue point is called the “predicted value of Y ,” and isdenoted Y .
Unless the actual point falls on the line, there will be some error in thisprediction. The error is the discrepancy in the vertical direction from theline to the point.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 14 / 46
Fitting a Straight Line Characteristics of a Straight Line
Fitting a Straight LineCharacteristics of a Straight Line
0 1 2 3 4 5
01
23
45
X
Y
Y
Y
E
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 15 / 46
Fitting a Straight Line Regression Notation
Fitting a Straight LineRegression Notation
Now, let’t generalize!
We have just shown that, for any point with coordinates (Xi ,Yi ), relativeto any line Y = b0 + b1X , I may write
Yi = b0 + b1Xi (1)
andYi = Yi + Ei (2)
with Ei defined tautologically as
Ei = Yi − Yi (3)
But we are not looking for any line. We are looking for the best line. Andwe have many points, not just one. And, by the way, what is the best line,and how do we find it?
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 16 / 46
Fitting a Straight Line The Least Squares Solution
Fitting a Straight LineThe Least Squares Solution
It turns out, there are many possible ways of characterizing how well a linefits a set of points.
However, one approach seems quite reasonable, and has many absolutelybeautiful mathematical properties.
This is the least squares criterion and the least squares solution for b1 andb0.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 17 / 46
Fitting a Straight Line The Least Squares Solution
Fitting a Straight LineThe Least Squares Solution
The least squares criterion states, the best-fitting line for a set of points isthat line which minimizes the sum of squares of the Ei for the entire set ofpoints.
Remember, the data points are there, plotted in the plane, nailed down, asit were. The only thing free to vary is the line, and it is characterized byjust two parameters, the slope and intercept.
For any slope b1 and intercept b0 I might choose, I can compute the sumof squared errors. And for any data set, the sum of squared errors isuniquely defined by that slope and intercept.
The sum of squared errors is thus a function of b1 and b0.
What we really have is a problem in minimizing a function of twounknowns.
This is a routine problem in first-year calculus. We won’t go through theproof of the least squares solution, we’ll simply give you the result.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 18 / 46
Fitting a Straight Line The Least Squares Solution
Fitting a Straight LineThe Least Squares Solution
The solution to the least squares criterion is as follows
b1 = ry ,xsysx
=sy ,xs2x
(4)
andb0 = My − b1Mx (5)
Note: If X and Y are both in Z score form, then b1 = ry ,x and b0 = 0.
Thus, once we remove the metric from the numbers, the very intimateconnection between correlation and regression is revealed!
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 19 / 46
Predicting Height from Shoe Size Creating a Fit Object
Predicting Height from Shoe SizeCreating a Fit Object
We could easily construct the slope and intercept of our regression linefrom summary statistics. But R actually has a facility to perform theentire analysis very quickly and automatically. You begin by producing alinear model fit object with the following syntax.
> fit.object <- lm(Height ~ Size)
R is an object oriented language. That is, objects can contain data andwhen general functions are applied to an object, the object “knows whatto do.” We’ll demonstrate on the next slide.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 20 / 46
Predicting Height from Shoe Size Examining Summary Statistics
Predicting Height from Shoe SizeExamining Summary Statistics
R has a generic function called summary. Look what happens when weapply it to our fit object.
> summary(fit.object)
Call:
lm(formula = Height ~ Size)
Residuals:
Min 1Q Median 3Q Max
-7.289 -1.112 0.066 1.356 5.824
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.5460 1.0556 49.8 <2e-16 ***
Size 1.6453 0.0928 17.7 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.02 on 219 degrees of freedom
Multiple R-squared: 0.589, Adjusted R-squared: 0.588
F-statistic: 314 on 1 and 219 DF, p-value: <2e-16
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 21 / 46
Predicting Height from Shoe Size Examining Summary Statistics
Predicting Height from Shoe SizeExamining Summary Statistics
The coefficients for the intercept and slope are perhaps the mostimportant part of the output.Here we see that the slope of the line is 1.6453 and the intercept is 52.5460.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 22 / 46
Predicting Height from Shoe Size Examining Summary Statistics
Predicting Height from Shoe SizeExamining Summary Statistics
Along with the estimates themselves, the program provides estimatedstandard errors of the coefficients, along with t statistics for testing thehypothesis that the coefficient is zero.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 23 / 46
Predicting Height from Shoe Size Examining Summary Statistics
Predicting Height from Shoe SizeExamining Summary Statistics
The program prints the R2 value, also known as the coefficient ofdetermination. When there is only one predictor, as in this case, the R2
value is just r2x ,y , the square of the correlation between height and shoesize.
The “adjusted R2” value is an approximately unbiased estimator. Withonly one predictor, this can essentially be ignored, but with manypredictors, it can be much lower than the standard R2 estimate.
The F -statistic tests that R2 = 0
When there is only one predictor, it is the square of the t-statistic fortesting that rx ,y = 0.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 24 / 46
Predicting Height from Shoe Size Examining Summary Statistics
Predicting Height from Shoe SizeExamining Summary Statistics
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 25 / 46
Predicting Height from Shoe Size Drawing the Regression Line
Predicting Height from Shoe SizeDrawing the Regression Line
Now we draw the scatterplot with the best-fitting straight line. Noticehow we draw the scatterplot first with the plot command, then draw theregression line in red with the abline command.> # draw scatterplot
> plot(Size, Height)
> # draw regression line in red
> abline(fit.object, col = "red")
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 26 / 46
Predicting Height from Shoe Size Drawing the Regression Line
Predicting Height from Shoe SizeDrawing the Regression Line
8 10 12 14
6570
7580
Size
Hei
ght
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 27 / 46
Predicting Height from Shoe Size Using the Regression Line
Predicting Height from Shoe SizeUsing the Regression Line
We can now use the regression line to estimate a male student’s heightfrom his shoe size.
Suppose a student’s shoe size is 13. What is his predicted height?
Y = b1X + b0 = (1.6453)(13) + 52.5460 = 73.9349
The predicted height is a bit less than 6 feet 2 inches.
Of course, we know that not every student who has a size 13 show willhave a height of 73.93. Some will be taller than that, some will be shorter.Is there something more we can say?
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 28 / 46
Predicting Height from Shoe Size Thinking about Residuals
Predicting Height from Shoe SizeThinking about Residuals
The predicted value Y = 73.93 actually represents the average height ofpeople with a shoe size of 13.
According to the most commonly used linear regression model, people witha shoe size of 13 actually have a normal distribution with a mean of 73.93,and a standard deviation called the “standard error of estimate.”
This quantity goes by several names, and in R output is called the“residual standard error.”
An estimate of this quantity is included in the R regression outputproduced by the summary function.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 29 / 46
Predicting Height from Shoe Size Thinking about Residuals
Predicting Height from Shoe SizeThinking about Residuals
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 30 / 46
Predicting Height from Shoe Size Thinking about Residuals
Predicting Height from Shoe SizeThinking about Residuals
In the population, the standard error of estimate is calculated from thefollowing formula
σe =√
1 − ρ2x ,y σy (6)
In the sample, we estimate the standard error of estimate with thefollowing formula
se =
√n − 1
n − 2
√1 − r2x ,y sy (7)
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 31 / 46
Partial Correlation An Example
Partial CorrelationAn Example
Residuals can be thought of as “The part of Y that is left over after thatwhich can be predicted from X is partialled out.”
This notion has led to the concept of partial correlation.
Let’s introduce this notion in connection with an example.
Suppose we gathered data on house fires in the Nashville area over thepast month. We have data on two variables — damage done by the fire, inthousands of dollars (Damage) and the number of fire trucks sent to thefire by the fire department (Trucks).
Here are the data for the last 10 fires.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 32 / 46
Partial Correlation An Example
Partial CorrelationAn Example
Trucks Damage
1 0 8
2 0 9
3 1 33
4 1 38
5 1 27
6 2 70
7 2 94
8 2 83
9 3 133
10 3 135
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 33 / 46
Partial Correlation An Example
Partial CorrelationAn Example
Plotting the regression line, we see that there is indeed, a strong linearrelationship between the number of fire trucks sent to a fire, and thedamage done by the fire.
> plot(Trucks, Damage)
> abline(lm(Damage ~ Trucks), col = "red")
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 34 / 46
Partial Correlation An Example
Partial CorrelationAn Example
0.0 0.5 1.0 1.5 2.0 2.5 3.0
2040
6080
100
120
140
Trucks
Dam
age
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 35 / 46
Partial Correlation An Example
Partial CorrelationAn Example
The correlation between Trucks and Damage is 0.9779.
Does this mean that the damage done by fire can be reduced by sendingfewer trucks?
Of course not. It turns out that the house fire records include anotherpiece of information. Based on a complex rating system, each housefirehas a rating based on the size of the conflagration. These ratings are in avariable called FireSize.
On purely substantive and logical grounds, we might suspect that ratherthan fire trucks causing the damage, that this third variable, FireSize,causes both more damage to be done and more fire trucks to be sent.
How can we investigate this notion statistically?
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 36 / 46
Partial Correlation An Example
Partial CorrelationAn Example
Suppose we predict Trucks from FireSize. The residuals represent thatpart of Trucks that isn’t attributable to Firesize. Call these residualsETrucks|FireSize.
Then suppose we predict Damage from Firesize. The residuals representthat part of Damage that cannot be predicted from FireSize. Call theseresiduals EDamage|Firesize.
The correlation between these two residual variables is called the partialcorrelation between Trucks and Damage with FireSize partialled out,and is denoted rTrucks,Damage|FireSize.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 37 / 46
Partial Correlation An Example
Partial CorrelationAn Example
There are several ways we can compute this partial correlation.
One way is to compute the two residual variables discussed above, andthen compute the correlation between them.
> fit.1 <- lm(Trucks ~ FireSize)
> fit.2 <- lm(Damage ~ FireSize)
> E.1 <- residuals(fit.1)
> E.2 <- residuals(fit.2)
> cor(E.1, E.2)
[1] -0.2163
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 38 / 46
Partial Correlation An Example
Partial CorrelationAn Example
Another way is to use the textbook formula
rx ,y |w =rx ,y − rx ,w ry ,w√
(1 − r2x ,w )(1 − r2y ,w )(8)
> r.xy <- cor(Trucks, Damage)
> r.xw <- cor(Trucks, FireSize)
> r.yw <- cor(Damage, FireSize)
> r.xy.given.w <- (r.xy - r.xw * r.yw)/sqrt((1 - r.xw^2) * (1 - r.yw^2))
> r.xy.given.w
[1] -0.2163
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 39 / 46
Partial Correlation An Example
Partial CorrelationAn Example
The partial correlation is −0.216.
Once size of fire is accounted for, there is a negative correlation betweennumber of fire trucks sent to the fire and damage done by the fire.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 40 / 46
Review Questions Question 1
Review QuestionsQuestion 1
Recall that Y = b1X + b0, with b1 = ρYXσY /σX .
The predicted scores in Y have a variance. Prove, using the laws of lineartransformation, that this variance may be calculated as
σ2Y
= ρ2YXσ2Y (9)
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 41 / 46
Review Questions Question 1
Review QuestionsQuestion 1
Answer. The laws of linear transformation state that, if Y = aX + b,then S2
Y = a2S2X . In other words, additive constants can be ignored, and
multiplicative constants “come straight through squared” in the variance.
Translating this idea to the Greek letter notation of the current problem,we find
σ2Y
= σ2b1X+b0 (10)
= b21σ
2X (11)
=
(ρYXσ
2Y
σ2X
)σ2X (12)
= ρ2YXσ2Y (13)
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 42 / 46
Review Questions Question 2
Review QuestionsQuestion 2
In the lecture notes on linear combinations, we demonstrate that thecovariance of two linear combinations may be derived by taking thealgebraic product of the two linear combination expressions, and thenapplying a straightforward conversion rule.
Using this approach, show that the covariance between Y and Y is equalto the variance of Y derived in the preceding problem.
Hint. The covariance of Y and Y is equal to the covariance of Y andb1X + b0.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 43 / 46
Review Questions Question 2
Review QuestionsQuestion 2
Answer. σY ,b1X+b0 = σY ,b1X because additive constants never affectcovariances. Now, applying the multiplicative rule, we find that
σY ,b1X = b1σY ,X (14)
=ρY ,XσYσX
ρY ,XσY σX (15)
= ρ2Y ,Xσ2Y (16)
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 44 / 46
Review Questions Question 3
Review QuestionsQuestion 3
Sarah got a Z score of +2.00 on the first midterm. If midterm 1 andmidterm 2 are correlated 0.75 and the relationship is linear, what is thepredicted Z -score of Sarah on the second exam?
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 45 / 46
Review Questions Question 3
Review QuestionsQuestion 3
Answer. When scores are in Z -score form, the formula for a predictedscore is Y = ρY ,XX , and so Sarah’s predicted score is (2)(0.75) = 1.50.This is a classic example of regression toward the mean.
James H. Steiger (Vanderbilt University) Introduction to Linear Regression 46 / 46