chapter 8 linear regression how can a model be created which represents the linear relationship...
TRANSCRIPT
![Page 1: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/1.jpg)
Chapter 8 Linear RegressionHOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
![Page 2: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/2.jpg)
Fat Versus Protein: An Example
The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:
How many grams of fat would an item with 25 grams of protein have?
Slide
8- 2
![Page 3: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/3.jpg)
What is Linear Regression
Remember that correlation suggests there is a “linear” relationship between two variables.
We can say more about the linear relationship between two quantitative variables with a model.
The linear relationship is modeled by a straight line through the data.
The data points do not all line up on the line, but a straight line summarizes the overall direction of the data.
![Page 4: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/4.jpg)
Regression and Residuals
Some points will be above the line some points will be below the line.
The estimate made from a model is the predicted value (denoted as ŷ ).
The difference between a predicted value and the actual value is known as the residual
ˆresidual observed predicted y y
![Page 5: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/5.jpg)
Residuals (cont.)
A negative residual means the predicted value’s too big (an overestimate).
A positive residual means the predicted value’s too small (an underestimate).
Slide
8- 5
![Page 6: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/6.jpg)
Line of Best Fit
Some residuals are positive (above the predicted line) and some are negative (below the predicted line).
To find how well the line fits we add up the residuals. If we add the negatives and the positives, they cancel each other out. Therefore we add the squared residual values.
The line of best fit is the line where the sum of the squared residuals is the smallest.
The regression line is also know as the Least Squared Regression Line (LSRL)
![Page 7: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/7.jpg)
Line of best fit
It is written as Ŷ = a + bxŷ= b0 +b1x
![Page 8: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/8.jpg)
Slope of the regression line
Our slope is always in units of y per unit of x
1y
x
sb r
s
![Page 9: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/9.jpg)
Y intercept
Our intercept is always in units of y
0 1b y b x
![Page 10: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/10.jpg)
Residuals Revisited
The model assumes all points are on the straight line. The points of data that are not on the line are those
that have not been modeled. Data = Model + Residual Residual = Data – Model In symbols
ˆe y y
![Page 11: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/11.jpg)
Example
Given the regression line for the previous scatter plot
Ŷ = 6.413 + 0.9769xPredicted Fat = 6.413 + 0.9769proteinWhat does the slope represent?What does the y intercept mean?
![Page 12: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/12.jpg)
Example continued
Given the regression line for the previous scatter plot
Ŷ = 6.413 + 0.9769x Predicted Fat = 6.413 + 0.9769protein How much fat would we expect an item with 12
grams of protein to have? How much protein would an item with 15 grams of
fat have?
![Page 13: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/13.jpg)
Example continued
Given the regression line for the previous scatter plot
Ŷ = 6.413 + 0.9769xPredicted Fat = 6.413 + 0.9769proteinA Double Whopper sandwich has 48 grams of
Protein and 58 grams of fat. What is the residual in fat for this sandwich?
![Page 14: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/14.jpg)
Example Burger King
The following are select items from the Burger King Menu with grams of fat and total calories
Item Calories Grams of fat
Whopper 650 37
Whopper with cheese 730 44
Big King 530 31
Hamburger 230 9
Cheeseburger 270 12
Tendergrill chicken Sandwich
460 21
Original chicken Sandwich 660 40
Big fish Sandwich 520 28
BK Veggie Burger 390 16
![Page 15: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/15.jpg)
Example Continued
What is the regression line for the data?
What is the slope in the context of the problem?
What is the y-intercept in the context of the problem?
A sandwich with 15 grams of fat would be expected to have how many calories?
A sandwich with 450 calories would be expected to have how many grams of fat?
A Bacon Cheeseburger has 13 grams of fat and 290 total calories, what is the residual in calories for this sandwich?
![Page 16: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/16.jpg)
Conditions Required
1. Quantitative Variable condition
2. Straight enough condition3. Outlier condition
![Page 17: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/17.jpg)
R-Squared
R2 – gives the fraction of the data’s variation accounted for by the model and 1 - R2 is the fraction of the original variation left in the residuals.
Example: Burger King sandwich example
r is 0.9881
r2 is 0.9763
97.63% of the calorie content in Burger King Sandwiches is explained by the fat content. 2.37% comes from other factors.
![Page 18: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/18.jpg)
Residual Plot
A diagram of the residuals of the regression line.
A noticeable pattern in the residual plot may indicate that the regression line is not a good model.
The residual plot of a better fit model will have appropriate scatter
![Page 19: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/19.jpg)
What not to do
Don’t fit a straight line to a non linear relationship
Beware of extraordinary pointsDon’t extrapolate beyond the dataDon’t infer that x causes y just because
there is a good linear model for their relationship
Don’t choose a model based on r2 alone.
![Page 20: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/20.jpg)
Breakfast Cereals, sugar and Calories
The following is data from 77 different breakfast cereals comparing the relationship of sugar in the cereal and the amount of calories with each cereal.
R = 0.564 Calories mean – 107.0 SD – 19.5
Sugar mean – 7.0 grams, SD – 4.4
What is the slope of regression line?
What is the y – intercept?
Write the regression equation?
Interpret
![Page 21: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/21.jpg)
Urban planning
We want to estimate the costs per person associated with traffic delays
2002 Urban mobility report (70 cities in 2000) Annual cost person mean - $298.96 SD - $180.83 Average speed per person mean – 54.34 mph, SD
4.494 mph R = -0.90 Write an equation to model this situation What does the slope mean?
![Page 22: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/22.jpg)
What to watch out for in Regression
Interpreting beyond the data – extrapolating
Influential pointsLurking variablesLinear regression that is not “linear” –
what to do
![Page 23: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/23.jpg)
Extrapolation
We cannot assume that a linear relationship in the data exists beyond the range of the data.
Once we venture into new x territory, such a prediction is called an extrapolation.
![Page 24: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/24.jpg)
Slide
9- 24
Extrapolation (cont.)
A regression of mean age at first marriage for men vs. year fit to the first 4 decades of the 20th century does not hold for later years:
![Page 25: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/25.jpg)
Influential Outliers
We say that a point is influential if omitting the point from the scatterplot completely gives a different model.
![Page 26: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/26.jpg)
Slide
9- 26
Outliers, Leverage, and Influence (cont.)
The following scatterplot shows that something was awry in Palm Beach County, Florida, during the 2000 presidential election…
![Page 27: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/27.jpg)
Lurking Variable
No matter how straight the line, no matter how strong the association, or how high the R-squared value is, there is no way to conclude from regression alone that one variable causes the other.
There is always the possibility that some third variable is driving both of the variables being observed.
![Page 28: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/28.jpg)
What to do when the linear regression line is not straight
Re-express the data with logs, square roots, reciprocals
We will look at square roots and logarithms, primarily
Example: taking the square root of the response variable and re-expressing the data in a scatterplot and examining the residual plot.
Example: Re-expressing data using a combination of logarithms, log(x), log (y) Fit a line to the curved graph – more difficult
![Page 29: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/29.jpg)
Slide 10- 29
The Ladder of Powers
Ratios of two quantities (e.g., mph) often benefit from a reciprocal.
The reciprocal of the data-1
An uncommon re-expression, but sometimes useful.
Reciprocal square root-1/2
Measurements that cannot be negative often benefit from a log re-expression.
We’ll use logarithms here“0”
Counts often benefit from a square root re-expression.
Square root of data values½
Data with positive and negative values and no bounds are less likely to benefit from re-expression.
Raw data1
Try with unimodal distributions that are skewed to the left.
Square of data values2
CommentNamePower
![Page 30: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/30.jpg)
Slide
10- 30
Plan B: Attack of the Logarithms (cont.)
![Page 31: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/31.jpg)
Slide
10- 31
Why Not Just a Curve?
If there’s a curve in the scatterplot, why not just fit a curve to the data?
![Page 32: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/32.jpg)
Slide
10- 32
Why Not Just a Curve? (cont.)
The mathematics and calculations for “curves of best fit” are considerably more difficult than “lines of best fit.”
Besides, straight lines are easy to understand.We know how to think about the slope and
the y-intercept.
![Page 33: Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?](https://reader037.vdocuments.net/reader037/viewer/2022110213/5697bfc21a28abf838ca50b4/html5/thumbnails/33.jpg)
Example: Data collected in the study of water pollution from commercial and domestic waste
Day Oxygen Demand
1 109
2 149
3 149
5 191
7 213
10 224