regression - uprag
TRANSCRIPT
![Page 1: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/1.jpg)
Regression
Chapter 10
Understandable Statistics
Ninth Edition
By Brase and Brase
Prepared by Yixun Shi
Bloomsburg University of Pennsylvania
![Page 2: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/2.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 2
Scatter Diagrams
• A graph in which pairs of points, (x, y), are plotted with x on the horizontal axis and y on the vertical axis.
• The explanatory variable is x.
• The response variable is y.
• One goal of plotting paired data is to determine if there is a linear relationship between x and y.
![Page 3: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/3.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 3
Finding the Best Fitting Line
• One goal of data analysis is to find a mathematical equation that “best” represents the data.
• For our purposes, “best” means the line that comes closest to each point on the scatter diagram.
![Page 4: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/4.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 4
Not All Relationships are Linear
![Page 5: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/5.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 5
Correlation Strength
• Correlation is a measure of the linear relationship between two quantitative variables.
• If the points lie exactly on a straight line, we say there is perfect correlation.
• As the strength of the relationship decreases, the correlation moves from perfect to strong to moderate to weak.
![Page 6: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/6.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 6
Correlation
• If the response variable, y, tends to increase as the explanatory variable, x, increases, the variables have positive correlation.
• If the response variable, y, tends to decrease as the explanatory variable, x, increases, the variables have negative correlation.
![Page 7: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/7.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 7
Correlation Examples
![Page 8: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/8.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 8
Correlation Coefficient
• The mathematical measurement that describes the correlation is called the sample correlation coefficient.
• Denoted by r.
![Page 9: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/9.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 9
Correlation Coefficient Features
1) r has no units.
2) -1 ≤ r ≤ 1
3) Positive values of r indicate a positive relationship between x and y (likewise for negative values).
4) r = 0 indicates no linear relationship.
5) Switching the explanatory variable and response variable does not change r.
6) Changing the units of the variables does not change r.
![Page 10: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/10.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 10
r = 0
![Page 11: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/11.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 11
r = 1 or r = -1
![Page 12: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/12.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 12
0 < r < 1
![Page 13: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/13.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 13
-1 < r < 0
![Page 14: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/14.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 14
Developing a Formula for r
![Page 15: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/15.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 15
Developing a Formula for r
![Page 16: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/16.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 16
A Computational Formula for r
![Page 17: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/17.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 17
Critical Thinking: Population vs. Sample Correlation
• Beware: high correlations do not imply a cause and effect relationship between the explanatory and response variables!!
![Page 18: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/18.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 18
Critical Thinking: Lurking Variables
• A lurking variable is neither the explanatory variable nor the response variable.
• Beware: lurking variables may be responsible for the evident changes in x and y.
![Page 19: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/19.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 19
Correlations Between Averages
• If the two variables of interest are averages rather than raw data values, the correlation between the averages will tend to be higher!
![Page 20: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/20.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 20
Linear Regression
• When we are in the “paired-data” setting:
– Do the data points have a linear relationship?
– Can we find an equation for the best fitting line?
– Can we predict the value of the response variable for a new value of the predictor variable?
– What fractional part of the variability in y is associated with the variability in x?
![Page 21: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/21.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 21
Least-Squares Criterion
• The line that we fit to the data will be such that the sum of the squared distances from the line will be minimized.
• Let d = the vertical distance from any data point to the regression line.
![Page 22: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/22.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 22
Least-Squares Criterion
![Page 23: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/23.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 23
Least-Squares Line
• a is the y-intercept.
• b is the slope.
bxay ˆ
![Page 24: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/24.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 24
Finding the Regression Equation
![Page 25: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/25.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 25
Finding the Regression Equation
![Page 26: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/26.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 26
Using the Regression Equation
• The point is always on the least-squares line.
• Also, the slope tells us that if we increase the explanatory variable by one unit, the response variable will change by the slope (increase or decrease, depending on the sign of the slope).
),( yx
![Page 27: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/27.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 27
Critical Thinking: Making Predictions
• We can simply plug in x values into the regression equation to calculate y values.
• Extrapolation can result in unreliable predictions.
![Page 28: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/28.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 28
Coefficient of Determination
• Another way to gauge the fit of the regression equation is to calculate the coefficient of determination, r2.
• We can view the mean of y as a baseline for all the y values.
• Then:
![Page 29: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/29.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 29
![Page 30: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/30.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 30
Coefficient of Determination
We algebraically manipulate the above equation to obtain:
![Page 31: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/31.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 31
Coefficient of Determination
Thus, r2 is the measure of the total variability in y that is explained by
the regression on x.
![Page 32: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/32.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 32
Coefficient of Determination
![Page 33: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/33.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 33
Inferences on the Regression Equation
• We can make inferences about the population correlation coefficient, ρ, and the population regression line slope, β.
![Page 34: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/34.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 34
Inferences on the Regression Equation
• Inference requirements:
– The set of ordered pairs (x, y) constitutes a random sample from all ordered pairs in the population.
– For each fixed value of x, y has a normal distribution.
– All distributions for all y values have equal variance and a mean that lies on the regression equation.
![Page 35: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/35.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 35
Testing the Correlation Coefficient
![Page 36: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/36.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 36
Testing the Correlation Coefficient
• We convert to a Student’s t distribution.
• The sample size must be ≥ 3.
![Page 37: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/37.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 37
Testing Procedure
![Page 38: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/38.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 38
Standard Error of Estimate
• As the points get closer to the regression line, Se decreases.
• If all the points lie exactly on the line, Se will be zero.
![Page 39: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/39.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 39
Computational Formula For Se
![Page 40: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/40.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 40
Confidence Intervals for y
• The population regression line:
y = α + βx + ε
Where ε is random error
• Because of ε, for each x value there is a corresponding distribution for y.
• Each y distribution has the same standard deviation, estimated by Se.
• Each y distribution is centered on the regression line.
![Page 41: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/41.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 41
Confidence Intervals for Predicted y
![Page 42: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/42.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 42
Confidence Intervals for Predicted y
![Page 43: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/43.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 43
Confidence Intervals for Predicted y
• The confidence intervals will be narrower near the mean of x.
• The confidence intervals will be wider near the extreme values of x.
![Page 44: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/44.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 44
Confidence Intervals for Predicted y
![Page 45: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/45.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 45
Inferences About Slope β
• Recall the population regression equation: y = α + βx + ε
• Let Se be the standard error of estimate from the sample.
![Page 46: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/46.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 46
Inferences About Slope β
• Has a Student’s t distribution with n – 2 degrees of freedom.
![Page 47: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/47.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 47
Testing β
![Page 48: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/48.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 48
Finding a Confidence Interval for β
![Page 49: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/49.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 49
Multiple Regression
• In practice, most predictions will improve if we consider additional data.
• Statistically, this means adding predictor variables to our regression techniques.
![Page 50: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/50.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 50
Multiple Regression Terminology
• y is the response variable.
• x1 … xk are explanatory variables.
• b0 … bk are coefficients.
![Page 51: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/51.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 51
Regression Models – A mathematical package that includes the following:
1) A collection of variables, one of which is the response, and the rest are predictors.
2) A collection of numerical values associated with the given variables.
3) Using software and the above numerical values, construct a least-squares regression equation.
![Page 52: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/52.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 52
Regression Models – A mathematical package that includes the following:
4) The model includes information about the variables, coefficients of the equation, and various “goodness of fit” measures.
5) The model allows the user to make predictions and build confidence intervals for various components of the equation.
![Page 53: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/53.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 53
Coefficient of Multiple Determination
• How well does our equation fit the data?
– Just as in linear regression, computer output will provide r2
. • r2 is a measure of the amount of
variability in y that is explained by the regression on all of the x variables.
![Page 54: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/54.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 54
Predictions in the Multiple Regression Case
• Similar to the techniques described for linear regression, we can “plug in” values for all of the x variables and compute a predicted y value.
– Predicting y for any x values outside the x-range is extrapolation, and the results will be unreliable.
![Page 55: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/55.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 55
Testing a Coefficient for Significance
• At times, one or more of the predictor variables may not be helpful in determining y.
• Using software, we can test any or all of the x variables for statistical significance.
![Page 56: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/56.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 56
Testing a Coefficient for Significance
• The hypotheses will be:
H0: βi = 0
H1: βi ≠ 0
for the ith coefficient in the model.
• If we fail to reject the null hypothesis for a given coefficient, we may consider removing that predictor from the model.
![Page 57: Regression - UPRAg](https://reader036.vdocuments.net/reader036/viewer/2022081506/62a3947cf245f666053e121f/html5/thumbnails/57.jpg)
Copyright © Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 | 57
Confidence Intervals for Coefficients
• As in the linear case, software can provide confidence intervals for any βi in the model.