1 9-2 / 9.3 correlation and regression. 2 n xy - ( x)( y) n( x 2 ) - ( x) 2 n( y 2 ) - ( y) 2...
TRANSCRIPT
1
9-2 / 9.3
Correlation and Regression
2
nxy - (x)(y)
n(x2) - (x)2 n(y2) - (y)2r =
DefinitionLinear Correlation Coefficient r
measures strength of the linear relationship between paired x and y values in a sample
3
Formula for b0 and b1
b0 = (y-intercept)(y) (x2) - (x) (xy)
n(xy) - (x) (y)
n(x2) - (x)2
b1 = (slope)n(x2) - (x)2
4
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Projectx Plastic (lb)
y Household
Review Calculations
Find the Correlation and the Regression Equation (Line of Best Fit)
5
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Projectx Plastic (lb)
y Household
b0 = 0.549
b1= 1.48
Using a calculator:
y = 0.549 + 1.48x
r = 0.842
Review Calculations
6
Notes on correlation
r represents linear correlation coefficient for a sample (ro) represents linear correlation coefficient for a
population -1 r 1 r measures strength of a linear relationship. -1 is perfect negative correlation & 1 is perfect
positive correlation
7
Interpreting the Linear Correlation Coefficient
If the absolute value of r exceeds the value in Table A - 6, conclude that there is a significant linear correlation.
Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation.
8
Formal Hypothesis Test
Two methods
Both methods let H0: = (no significant linear correlation)
H1: (significant linear correlation)
9
Method 1: Test Statistic is t(follows format of earlier chapters)
Test statistic:
1 - r 2
n - 2
r
Critical values:
use Table A-3 with degrees of freedom = n - 2
t =
10
Test statistic: r
Critical values: Refer to Table A-6 (no degrees of freedom)
Much easier
Method 2: Test Statistic is r(uses fewer calculations)
11
TABLE A-6 Critical Values of the Pearson Correlation Coefficient r
456789
101112131415161718192025303540455060708090
100
n
.999
.959
.917
.875
.834
.798
.765
.735
.708
.684
.661
.641
.623
.606
.590
.575
.561
.505
.463
.430
.402
.378
.361
.330
.305
.286
.269
.256
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.514
.497
.482
.468
.456
.444
.396
.361
.335
.312
.294
.279
.254
.236
.220
.207
.196
= .05 = .01
12
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Projectx Plastic (lb)
y Household
n = 8 = 0.05 H0: = 0
H1 : 0
Test statistic is r = 0.842
Is there a significant linear correlation?
13
n = 8 = 0.05 H0: = 0
H1 : 0
Test statistic is r = 0.842
Critical values are r = - 0.707 and 0.707(Table A-6 with n = 8 and = 0.05)
TABLE A-6 Critical Values of the Pearson Correlation Coefficient r
456789
101112131415161718192025303540455060708090
100
n.999.959.917.875.834.798.765.735.708.684.661.641.623.606.590.575.561.505.463.430.402.378.361.330.305.286.269.256
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.514
.497
.482
.468
.456
.444
.396
.361
.335
.312
.294
.279
.254
.236
.220
.207
.196
= .05 = .01
Is there a significant linear correlation?
14
0r = - 0.707 r = 0.707 1
Sample data:r = 0.842
- 1
0.842 > 0.707, That is the test statistic does fall within the critical region.
Is there a significant linear correlation?
Fail to reject = 0
Reject= 0
Reject= 0
15
0r = - 0.707 r = 0.707 1
Sample data:r = 0.842
- 1
0.842 > 0.707, That is the test statistic does fall within the critical region.
Therefore, we REJECT H0: = 0 (no correlation) and concludethere is a significant linear correlation between the weights ofdiscarded plastic and household size.
Is there a significant linear correlation?
Fail to reject = 0
Reject= 0
Reject= 0
16
RegressionDefinitionRegression Model
Regression Equation
y = b0 + b1x^
Given a collection of paired data, the regression equation algebraically describes the relationship between the two variables
y = 0 + 1x +
17
Notation for Regression Equation
y-intercept of regression equation 0 b0
Slope of regression equation 1 b1
Equation of the regression line y = 0 + 1 x + y = b0 + b1
PopulationParameter
SampleStatistic
x^
18
RegressionDefinition Regression Equation
Given a collection of paired data, the regression equation
Regression Line (line of best fit or least-squares line)
is the graph of the regression equation
y = b0 + b1x^
algebraically describes the relationship between the two variables
19
Assumptions & Observations
1. We are investigating only linear relationships.
2. For each x value, y is a random variable having a normal distribution.
3. There are many methods for determining normality.
3. The regression line goes through (x, y)
20
1. If there is no significant linear correlation, don’t use the regression equation to make predictions.
2. Stay within the scope of the available sample data when making prediction.
Guidelines for Using TheRegression Equation
21
Definitions Outlier a point lying far away from the other
data points
Influential Points points which strongly affect the graph
of the regression line
22
DefinitionsResidual (error)
for a sample of paired (x,y) data, the difference (y - y) between an observed sample y-value and the value of y, which is the value of y that is predicted by using the regression equation.
Least-Squares PropertyA straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.
^
Residuals and the Least-Squares Property
^
23
Residuals and the Least-Squares Property
x 1 2 4 5y 4 24 8 32
y = 5 + 4x
02468
101214161820222426283032
1 2 3 4 5
•
•
•
x
yResidual = 7
Residual = -13Residual = -5
Residual = 11
^
•
24
DefinitionsTotal Deviation from the mean of the particular point (x, y)
the vertical distance y - y, which is the distance between the point (x, y) and the horizontal line passing through the sample mean y
Explained Deviationthe vertical distance y - y, which is the distance between the predicted
y value and the horizontal line passing through the sample mean y
Unexplained Deviationthe vertical distance y - y, which is the vertical distance between the
point (x, y) and the regression line. (The distance y - y is also called a residual, as defined in Section 9-3.)
^
^
^
25
Totaldeviation
(y - y)
01
35
79
1113151719
2123
252729
313335
3739
•
•
•
Unexplaineddeviation
(y - y)
Explaineddeviation
(y - y)
(5, 32)
(5, 25)
(5, 17)
y = 5 + 4x^
y = 17
^
^
y
x0 1 2 3 4 5 6 7 8 9
Unexplained, Explained, and Total Deviation
26
(y - y) = (y - y) + (y - y)(total deviation) = (explained deviation) + (unexplained deviation)
(total variation) = (explained variation) + (unexplained variation)
Σ(y - y) 2
= Σ (y - y) 2
+ Σ (y - y) 2^ ^
^ ^