100310plg500 l11-simple linear regression
TRANSCRIPT
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
1/69
PLG 500STATISTICAL REASONINGIN EDUCATION
Lecture 11:
Simple Linear Regression
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
2/69
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
3/69
1. Simple Linear
Regression Simple linear regression is the process of
predicting or estimating scores on onevariable (Y), based on knowledge of scoreson another variable (X), if Y and X arecorrelated
Y - the dependent, target or criterionvariable
X - the independent, regressor orpredictor variable
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
4/69
Example 1: Predicting scores of Y fromscores of X when the correlation between Yand X is perfect (r= 1)
Suppose you are interested in predicting scoreson Y, based on knowledge of scores on X, usingthe following hypothetical data:
a) Predict the score of Y when X = 16
b) Predict the score of Y when X = 125.5
X Y
2 3
4 4
6 5
8 6
10 7
12 814 9
Y = 10
Y = ?
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
5/69
Predicting the score of Y when X =
125.5
1. Plot a scatterplot
2. Draw a straight line that best fits
the data3. Determine the equation of the
straight line
4. Use the equation of the straightline to predict the score of Y whenX = 125.5
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
6/69
Step 1: Plot ascatterplot
Y
X
X Y
2 3
4 4
6 5
8 6
10 7
12 8
14 9
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
7/69
Step 2: Draw a straight line that best fits thedata
Y
X
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
8/69
The equation of a straight line:
Y = m X + c
where m = gradient (slope) of the straight line
and c = Y-intercept, that is the value of Y wherethe straight line intercepts the Y-axis
Step 3: Determine the equation of the straightline
distancehorizontal
distancevertical=m
m = ? , c = ?
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
9/69
Step 3: Determine the equation of the straightline
5.0
2
1
distancehorizontaldistancevertical
=
=
=m
m = 0.5 indicates that an increase of 0.5
units in Yis associated with an increase of 1unit inX
Y = m X +c
Y = 0.5 X +2
Y
X
c = Y-intercept = 2
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
10/69
The equation of the straight line:
Y=0.5 X +2
whenX= 125.5, Y=0.5(125.5) + 2
= 64.75
Step 4: Use the equation of the straight lineto predict the score of Y when X = 125.5
E l 2 P di ti f Y f
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
11/69
Example 2: Predicting scores of Y fromscores of X when the correlation between Yand X is not perfect (r 1)
Suppose you are interested in predictingstudents scores on creativity (Y), based onknowledge of their scores on logical reasoning(X) , using the following hypothetical data for20 students:
Predict the creativity score for a student with alogical reasoning score of 25.
X 15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
Y 12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11
X = 25, Y = ?
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
12/69
Predicting the creativity score for astudent with a logical reasoning score
of 25
1. Plot a scatterplot
2. Draw a straight line that best fits
the data3. Determine the equation of the
straight line
4. Use the equation of the straight lineto predict the creativity score for astudent with a logical reasoningscore of 25
X = 25, Y = ?
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
13/69
20
18
16
14
12
10
8
6
4
2
-2
-5 5 10 15 20 25 30 35
Penakulan Logik (X)
Kreativiti (Y)
Step 1: Plot a scatterplot
Creativity (Y)
Logical reasoning (X)
X 15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
Y 12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
14/69
20
18
16
14
12
10
8
6
4
2
-2
-5 5 10 15 20 25 30 35
Penakulan Logik (X)
Kreativiti (Y)
Step 2: Draw a straight line that best fitsthe dataWhich is the line of best fit?
Creativity (Y)
Logical reasoning (X)
The method of leastsquares
(the line of best fit)
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
15/69
The method of least squares The method of least squares fits the
straight line in such a way that:
the sum of squares of the differencebetween the actual value ofYand thepredicted value ofY(Y) is a minimum
Or is a minimum2)'( YY
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
16/69
The method of least squares
20
18
16
14
12
10
8
6
4
2
-2
-5 5 10 15 20 25 30 35
Creativity (Y)
Logical Reasoning (X)
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
17/69
The method of least squares
20
18
16
14
12
10
8
6
4
2
-2
-5 5 10 15 20 25 30 35
Creativity (Y)
Logical Reasoning (X)
The line of best fit iscalled theregression line.
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
18/69
The method of least squares
20
18
16
14
12
10
8
6
4
2
-2
-5 5 10 15 20 25 30 35
Y - Y'
Creativity (Y)
Logical Reasoning (X)
Y'
Y (actual value ofY)
(predicted value ofY(Y')
regression line
the differencebetween the actual
value ofYand thepredicted value ofY
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
19/69
The method of least squares
2
)'( YY is a minimum
20
18
16
14
12
10
8
6
4
2
-2
-5 5 10 15 20 25 30 35
16 - 16.33 = -0.33
19 -16.33 =2.67
Y - Y'
Creativity (Y)
Logical Reasoning (X)
Y'= 0.65X+ 5.28(regression equation)
Y'
Y (actual value ofY)
(predicted value ofY)
Actual value of Y
Predicted value of Y
19 16.33
16
(Y Y)= 0
How to drawthe regression
line?
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
20/69
X Y Y = 0.65X + 5.28 Y Y (Y Y)2
15 12 Y = 0.65(15) + 5.28 =15.03
12 15.03 = -3.03 9.18
10 13 11.78 1.22 1.49
7 9 9.83 -0.83 0.69
18 18 16.98 1.02 1.04
5 7 8.53 -1.53 2.34
10 9 11.78 -2.78 7.73
7 14 9.83 4.17 17.39
17 16 16.33 -0.33 0.11
15 10 15.03 -5.03 25.30
9 12 11.13 0.87 0.76
8 7 10.48 -3.48 12.11
15 13 15.03 -2.03 4.12
11 14 12.43 1.57 2.46
17 19 16.33 2.67 7.13
8 10 10.48 -0.48 0.23
11 16 12.43 3.57 12.74
12 12 13.08 -1.08 1.17
13 16 13.73 2.27 5.15
18 19 16.98 2.02 4.08
7 11 9.83 1.17 1.37
X = 233 Y =257 (Y Y)= 0.00* (Y Y)2 = 116.59*(Y Y)=0.05, and does not equal zero because of rounding errors (Hinkle etal., 2003).
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
21/69
Step 3: Determine the equation ofthe regression line
To draw the regression line, we need todetermine the equation of the regression linewhich is called the regression equation
The regression equation is defined as follows:
Y = b X+ awhere
Y = predicted score ofY
b = gradient of the regression line (regressioncoefficient)
X = the score used to predict the score ofY
a = Y-intercept (regression constant)
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
22/69
Regression coefficient, b
The value ofb, which is the gradient ofthe regression line, is called theregression coefficient
The regression coefficient shows theamount of change in Y that isassociated with a unit change in X
The formula for finding b is as follows:
22 )( XXn
YXXYnb
=
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
23/69
Regression constant, a
The value ofa, which is the Y-interceptof the regression line, is called theregression constant
The regression constant shows thevalue ofYwhere theregression lineintercepts the Y-axis or the value ofYwhenXequals 0
The formula for finding a is as follows:
XbYa
n
XbYa
=
=
or
YXXYn XbY
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
24/69
X Y XY X2
15 12 180 225
10 13 130 100
7 9 63 49
18 18 324 324
5 7 35 25
10 9 90 100
7 14 98 49
17 16 272 289
15 10 150 225
9 12 108 81
8 7 56 64
15 13 195 225
11 14 154 121
17 19 323 289
8 10 80 64
11 16 176 121
12 12 144 144
13 16 208 169
18 19 342 324
7 11 77 49X = 233 Y =257 XY =3 205 X2 =3 037
22 )( XXn
YXXYnb
= n
XbYa
= 20n =
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
25/69
Computing the value of the regressioncoefficient, b
X = 233 Y =257 XY =3 205 X2 =3 037
650
233037320
257233205320
2
22
.
),(
))((),(
)(
=
=
=XXn
YXXYnb
The positive value ofb, that is 0.65 shows that a0.65-unit increase in Y is associated with a 1-unit increase inX
20n =
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
26/69
Computing the value of the regressionconstant, a
X = 233 Y =257
The positive value ofa, that is 5.28 shows thattheregression line intercepts the Y-axisat 5.28
285
20
233650257
.
))(.(
==
=n
XbYa
20n =
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
27/69
Determining the regression equationfor Example 2:
Therefore, the regression equationof the regression line is:
Y= b X + a
Y= 0.65X+ 5.28
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
28/69
The regression equation for Example 2
20
18
16
14
12
10
8
6
4
2
-2
-5 5 10 15 20 25 30 35
Y'= 0.65X+ 5.28(regression equation)
1-unit increase in X
0.65-unit increase in Y
Creativity (Y)
Logical Reasoning (X)
a =5.28 b =
0.65
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
29/69
The regression equation of the regression line:
Y= 0.65X+ 5.28
whenX= 25, Y= 0.65 (25) + 5.28
= 21.53
Step 4: Use the regression equation of the
regression line to predict the creativity score (Y)of a student when his or her logical reasoning
score (X) is 25
2 Obtaining the Regression Line Using
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
30/69
2. Obtaining the Regression Line UsingSPSS
i. Obtaining the scatterplot using SPSS
ii. Obtaining the regression line usingSPSS
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
31/69
i. Obtaining the scatterplotusing SPSS
Create a file for the data set
Click Graphs Scatter
Click Simple and then click Define Click on the Y variable and click the to
place it in the Y Axis box
Click on the X variable and click the to
place it in the X Axis box Click OK
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
32/69
Double-click on the scatterplot
Click on any point in the scatterplot
ClickChart > Add Chart Element > Fit Line at Total
ClickLinear > Apply > Close
Exit Chart Editor
ii. Obtaining the Regression LineUsing SPSS
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
33/69
SPSS scatterplot with regressionline
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
34/69
Create a file for the data set
ClickAnalyze > Regression > Linear
Click on the Y variable and click the to place it in the Dependent: box
Click on the X variable and click the to place it in the Independent(s): box
Click OK
3. Obtaining the Regression EquationUsing SPSS
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
35/69
3. Obtaining the Regression EquationUsing SPSS
Coefficients(a)
Y = b X + a
Y = 0.654X + 5.231
a Dependent Variable: Creativity (Y)
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
B Std. Error Beta
1 (Constant) 5.231 1.746 2.996 .008
Logical
Reasoning (X)
.654 .142 .736 4.615 .000b
a
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
36/69
4. Errors in Prediction
Errors in prediction are the differencesbetween the actual scores ofYand thepredicted scores ofY (Y)
The formula for the calculation of the errorin prediction (e ) is as follows:
e = Y Y
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
37/69
4. Errors in Prediction20
18
16
14
12
10
8
6
4
2
-2
-5 5 10 15 20 25 30 35
16 - 16.33 = -0.33
19 -16.33 =2.67
Creativity (Y)
Logical Reasoning (X)
Y'= 0.65X+ 5.28(regression equation)
Y'
Y
e = Y - Y'
(actual value ofY)
(predicted value ofY)
e = Y Y
X Y Y 0 65X + 5 28 e Y Y e2 (Y Y)2
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
38/69
X Y Y = 0.65X + 5.28 e = Y Y e2 = (Y Y)2
15 12 Y = 0.65(15) + 5.28 =15.03
12 15.03 = -3.03 9.18
10 13 11.78 1.22 1.49
7 9 9.83 -0.83 0.69
18 18 16.98 1.02 1.045 7 8.53 -1.53 2.34
10 9 11.78 -2.78 7.73
7 14 9.83 4.17 17.39
17 16 16.33 -0.33 0.11
15 10 15.03 -5.03 25.30
9 12 11.13 0.87 0.76
8 7 10.48 -3.48 12.11
15 13 15.03 -2.03 4.12
11 14 12.43 1.57 2.46
17 19 16.33 2.67 7.13
8 10 10.48 -0.48 0.23
11 16 12.43 3.57 12.74
12 12 13.08 -1.08 1.17
13 16 13.73 2.27 5.15
18 19 16.98 2.02 4.08
7 11 9.83 1.17 1.37
X = 233 Y =257 e= 0.00* e2 = 116.59*e=0.05, and does not equal zero because of rounding errors (Hinkle et al.,2003).
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
39/69
5. Standard Error ofEstimate
The standard deviation of the distribution oferrors in prediction is calledthe standard error of estimate
The standard error of estimate is an overall measure of the extent to which
the predicted Y valuesdeviate from the actual Yvalues is represented by se
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
40/69
5. Standard Error ofEstimate
The formula for the calculation of the standard error ofestimate (se) is as follows:
2
2
)(
2
2
=
=
n
esor
n
YYs
e
e
X Y Y = 0 65X + 5 28 Y Y (Y Y)2
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
41/69
X Y Y = 0.65X + 5.28 Y Y (Y Y )
15 12 0.65(15) + 5.28 = 15.03 12 15.03 = -3.03 9.18
10 13 11.78 1.22 1.49
7 9 9.83 -0.83 0.69
18 18 16.98 1.02 1.04
5 7 8.53 -1.53 2.34
10 9 11.78 -2.78 7.73
7 14 9.83 4.17 17.39
17 16 16.33 -0.33 0.11
15 10 15.03 -5.03 25.30
9 12 11.13 0.87 0.76
8 7 10.48 -3.48 12.11
15 13 15.03 -2.03 4.12
11 14 12.43 1.57 2.46
17 19 16.33 2.67 7.13
8 10 10.48 -0.48 0.23
11 16 12.43 3.57 12.74
12 12 13.08 -1.08 1.17
13 16 13.73 2.27 5.15
18 19 16.98 2.02 4.08
7 11 9.83 1.17 1.37
X = 233 Y =257 (Y Y) = 0.00* (Y Y)2 = 116.59*(Y Y) =0.05, and does not equal zero because of rounding errors (Hinkle et al.,2003).
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
42/69
5. Standard Error ofEstimate
The standard error of estimate for the creativityscore is:
552
220
59116
2
2
.
.
)(
==
=
n
YY
se
(Y Y)2 = 116.59
n = 20
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
43/69
5. Standard Error ofEstimate
The stronger the correlation between Y and X
The smaller the standard error of estimate
The greater the accuracy of prediction
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
44/69
5. Standard Error ofEstimate
The stronger the correlation between Y and X
(e.g., r= 1)
The smaller the standard error of estimate
(e.g., se = 0)
The greater the accuracy of prediction
(100% accurate)
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
45/69
6. Obtaining the standard errorof estimate using SPSS
Create a file for the data set ClickAnalyze > Regression > Linear
Click on the Y variable and click theto place it in the Dependent: box
Click on the X variable and click theto place it in the Independent(s): box
Click OK
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
46/69
SPSS output
se = 2.545
Model R R Square Adjusted R Square Std. Error of the
Estimate
1 .736(a) .542 .517 2.545
a Predictors: (Constant), X, Penakulan Logik
Model Summary
7 Testing the regression
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
47/69
7. Testing the regressioncoefficient for statisticalsignificance
To determine whether the predictor
variable (X) is a statisticallysignificant predictor of the criterionvariable (Y)
That is, to determine whether
knowledge of scores on theXvariablewillenhance the predictionof scores on the Y variable
7. Testing the regression
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
48/69
7. Testing the regressioncoefficient for statisticalsignificance
Assumptions underlying the significancetest for the regression coefficient:
1.The scores for each variable arenormally distributed
2.The cases represent a random sample
from the population3. Both variables are independent
7. Testing the regression
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
49/69
7. Testing the regressioncoefficient for statisticalsignificance
Steps for the significance test:
1. State the null and alternative hypotheses
2. Set the criterion for rejecting the null hypothesis
3. Carry out the analysis using SPSS4. Make a decision by applying the criterion for rejecting
the null hypothesis
5. Make a conclusion in the context of the problem
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
50/69
Example 1:
Suppose you are interested in predicting students
scores on creativity (Y), based on knowledge of theirscores on logical reasoning (X) , using the followinghypothetical data for 20 students.
Test whether logical reasoning is a statistically
significant predictor of creativity at the 0.01 level ofsignificance.
X 15 10 7 18 5 10 7 17 15 9 8 15 11 17 8 11 12 13 18 7
Y 12 13 9 18 7 9 14 16 10 12 7 13 14 19 10 16 12 16 19 11
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
51/69
Step 1: State the null and alternativehypotheses
Ho : = 0
(Logical reasoning is not a statisticallysignificant predictor of creativity in thepopulation)
H1 : 0 (Logical reasoning is a statistically significant
predictor of creativity in the population)
or beta is the population regressioncoefficient
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
52/69
Step 2: Set the criterion for rejectingthe null hypothesis
Reject Ho ifp < 0.01
p < 0.01 (The probability ofcommitting a Type I error that is, thelikelihood of rejecting the null
hypothesis when it is true is less than0.01)
0.01 is the level of significance (or )
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
53/69
Step 3: Carry out the analysis usingSPSS
Create a file for the data set ClickAnalyze > Regression > Linear
Click on the Y variable and click theto place it in the Dependent: box
Click on the X variable and click theto place it in the Independent(s): box
Click OK
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
54/69
SPSS output
Coefficients(a)
a Dependent Variable: Creativity, Y
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
B Std. Error Beta
1 (Constant) 5.231 1.746 2.996 .008
Logical
Reasoning, X
.654 .142 .736 4.615 .000
Y = b X + a
Y = 0.654X + 5.231
b
a
p = .000 ( Regression > Linear
Click on the Y variable and click theto place it in the Dependent: box
Click on the X variable and click theto place it in the Independent(s): box
Click OK
SPSS output
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
64/69
Model Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
B Std. Error Beta
1 (Constant) 74.033 5.939 12.466 .000
The order in
which
students turn
in their test
papers, X
-.004 .697 -.002 -.006 .995
Coefficients(a)
a Dependent Variable: Test score, Y
SPSS output
p = . 995 (>
0.05)
t
Step 4: Make a decision by applying
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
65/69
Step 4: Make a decision by applyingthe criterion for rejecting the nullhypothesis
From the SPSS output,p = 0.995
(The probability of committing a Type Ierror that is, the likelihood of rejectingthe null hypothesis when it is true is0.995)
Therefore, fail to reject Hobecausep >0.05
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
66/69
Step 5: Make a conclusion in thecontext of the problem
The order in which students turn in their testpapers is not a statistically significant predictor oftheir test scores in the population,
t(13) = -.006,p > .05
(That is, knowledge of the order in which studentsturn in their test papers does not enhance theprediction of their test scores)
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
67/69
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
68/69
Step 5: Make a conclusion in thecontext of the problem
r2 = 0.000
0% of the variance in the test scores can
be associated with (explained by) thevariance in the order in which studentsturn in their test papers
[Or 100% of the variance in the test scores
cannot be associated with (explained by)the variance in the order in which studentsturn in their test papers]
-
8/6/2019 100310PLG500 L11-Simple Linear Regression
69/69
Thank you for yourattention