Download - Unit 8 (powerpoint)
Unit 8
Linear Modeling
Linear Models
• The correlation coefficient measures the strength of the linear relationship between two quantitative variables x and y.
• A linear equation describing how an dependant variable, y, is associated with an explanatory variable, x, looks like
y = a + bx
Example
A college charges a basic fee of $100 a semester for a meal plan plus $2 a meal. The linear equation describing the association between the cost of the meal plan, y, and the number of meals purchased, x, is:
y = 100 + 2x
Linear Equations
A linear equation takes the form
y = a + bx
b = slope
a = y-intercept
The slope measures the rate of change of y with respect to x
The y-intercept measures the initial value of y (value of y when x = 0)
Linear Modeling
• Rarely does an exact linear relationship exist between two studied variables.
• The correlation coefficient and the scatter plot help us decide if there is a reasonably strong linear relationship between two studied variables.
Data The table gives the age and systolic blood
pressure of 30 subjectsIndividual SBP (Y) Age (X) Individual SBP (Y) Age (X)1 144 39 16 130 482 220 47 17 135 453 138 45 18 114 174 145 47 19 116 205 162 65 20 124 196 142 46 21 136 367 170 67 22 142 508 124 42 23 120 399 158 67 24 120 2110 154 56 25 160 4411 162 64 26 158 5312 150 56 27 144 6313 140 59 28 130 2914 110 34 29 125 2515 128 42 30 175 69
Approximate Positive Linear Relationship
Age
Syst
olic
Blo
od p
ress
ure
70605040302010
220
200
180
160
140
120
100
Scatterplot of Systolic Blood pressure vs Age
Equation of Fitted Line SBP = 98.7 + 0.97(AGE)
y = 98.7 + 0.97 x
Age
Syst
olic
Blo
od p
ress
ure
70605040302010
220
200
180
160
140
120
100
Scatterplot of Systolic Blood pressure vs Age
Interpretation of Slope
• The slope of the SBP vs Age fitted equation is 0.97
• 0.97 = rate of change of SBP with respect to age
• Every year a subject’s blood pressure rises approximately 0.97 units.
Least Squares Method for Line of Best Fit
Interactive Unit D2, Basics, Basics 1
Interactive Unit D2, Basics, Practice 1
Residuals
• One method for assessing how well a linear equation models the data is assessing the extent to which points differ from the line.
• A residual is the difference between an observed y value and the corresponding value of y on the fitted line (predicted y)
• Residual = Observed y - Predicted y
Sum of Squares of the Residuals
• The line of best fit is the one with the smallest sum of squares of the residuals
• It is called the least squares line or sometimes the least squares regression line
• The challenge is to find the slope and y-intercept of this least squares line
More Practice with Find the Least Square Line
• Interactive D2, Basics, Basics2
The “Formulas”
The methods of calculus can be used to find equations for the slope and y-intercept of the least squares line. Here are the results.
2
( )( )
( )
x X y Yb
x X
a Y b X
The Good News
Many computer programs including Excel and MINITAB as well as graphing calculators provide the slope and y-intercept of the least squares line
Example Find the slope and y-intercept for the least
squares line describing the association between age and blood pressure suggested by this data
Individual SBP (Y) Age (X) Individual SBP (Y) Age (X)1 144 39 16 130 482 220 47 17 135 453 138 45 18 114 174 145 47 19 116 205 162 65 20 124 196 142 46 21 136 367 170 67 22 142 508 124 42 23 120 399 158 67 24 120 2110 154 56 25 160 4411 162 64 26 158 5312 150 56 27 144 6313 140 59 28 130 2914 110 34 29 125 2515 128 42 30 175 69
The Line of Best Fit
• The line that best fits the data is taken to be the one with the “smallest” residuals.
• Since residuals can be both positive and negative they are squared to insure all are positive
• The squared residuals are then added to find a measure of the total amount the fitted values deviate from the observed values
Least Squares Line
Y = SBP X = Age
Y = 98.7 + 0.97X
Predictions
The prediction equation y = 98.7 + 0.97x
can be used to predict a person’s SBP based on their age
For a randomly selected person who is 40 years old, the least squares equation predicts a SBP of
98.7 + 0.97(40) = 137.5
Making Predictions
Use the sample least squares line
y = 98.7 + 0.97x
to complete the table
Age 35 45 55 65SBP
Back to Residuals
SSRes =
is a measure of the total amount of deviation from the fitted line.
It is a measure of the variability in the data that is not explained the the linear relationship with the variable x
It measures the variability due to factors other than the explanatory variable x
2( )observed predictedy y
Back to Age vs SBP• SSRes = = 8393.44
• SSTotal = = 14787.47
• 56.76% of the variability in the SBP data is explained by factors other than age
• 1 - 56.76% = 43.24% of the variability in SBP can be explained by the linear relationship with age
2( )y YRe 8393.44
0.567614787.47
SS s
SSTotal
2( )observed predictedy y
The value of r2
• The correlation coefficient, r, for the SBP vs Age data is 0.65757
• r2 = (0.65757)2 = 0.4324
• When r2 is converted to a percent, 43.24% it corresponds to the percent variability in SBP that is explained by age
Interpretation of r2
When r2 is converted to a percent it can be interpreted as the percent of the variability in the response variable, y, that can be explained by the linear relationship with the explanatory variable, x.
Find the least squares line, the values of r and r2 Interpret r2 Interpret the slope
Model Weight (pounds) City MPGBMW 318Ti 2790 23BMW Z3 2960 19Chevrolet Camaro 3545 17Chevrolet Corvette 3295 17Ford Mustang 3270 17Honda prelude 3040 22Hyundai Tiburon 2705 22Mazda Miata 2365 25Mercury Cougar 3140 20Mercedes Benz SLK 3020 22Mitsubishi Eclipse 3235 23Pontiac Firebird 3545 18Porsche Boxster 2905 19Saturn SC 2420 27Toyota Celica 2720 22
Scatter Graphr = -0.816
Weight
City M
PG
35003250300027502500
28
26
24
22
20
18
16
Scatterplot of City MPG vs Weight
ResidualsModel Weight City MPG Residual
BMW 318Ti 2790 23 0.69556
BMW Z3 2960 19 -2.12366
Chevrolet Camaro 3545 17 -0.06038
Chevrolet Corvette 3295 17 -1.79682
Ford Mustang 3270 17 -1.97047
Honda prelude 3040 22 1.43200
Hyundai Tiburon 2705 22 -0.89483
Mazda Miata 2365 25 -0.25640
Mercury Cougar 3140 20 0.12658
Mercedes Benz SLK 3020 22 1.29309
Mitsubishi Eclipse 3235 23 3.78643
Pontiac Firebird 3545 18 0.93962
Porsche Boxster 2905 19 -2.50568
Saturn SC 2420 27 2.12562
Toyota Celica 2720 22 -0.79065
Vehicles with the Largest Positive and Negative Residuals
• Mitsubishi Eclipse got 3.876 city MPG more than expected
• Porsche Boxster got 2.506 city MPG less than expected
Analysis• City MPG = 41.7 - 0.00695 Weight• Each additional pound translates into a loss of
approximately .00695 city MPG• Each additional 1000 pounds translates into a
loss of approximately 6.95 city MPG• r2 = 66.6%• 66.6% of the variability in city MPG can be
explained by the linear association with the weight of the vehicle. 33.4% of the variability in city MPG is due to factors other than the weight of the vehicle.