civl 7012/8012 - memphis linear...𝑖= observed value of dependent variable (tip amount)....
TRANSCRIPT
![Page 1: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/1.jpg)
CIVL 7012/8012
Simple Linear Regression
Lecture 2
![Page 2: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/2.jpg)
Correlation
• Correlation is the degree to which two continuous variables are
linearly associated.
• This is most often represented by a scatterplot and the Pearson
correlation coefficient, denote by (𝑟).
• The scatterplot provides a visual as to how the two continuous
variable are correlated.
• The coefficient is a measure of the linear association between the
two variables.
![Page 3: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/3.jpg)
Correlation
• If there is no correlation between the two variables, the points will
form a horizontal or vertical line or complete randomness (no obvious
patterns).
• Note that it does not matter which variable is on x-axis and which is
on the y-axis.
• The pattern the two variables form determines the strength and
direction of their correlation.
![Page 4: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/4.jpg)
Correlation
• The stronger the correlation, the more
linearly distinct the pattern will be.
• The coefficient is between -1 and 1.
+1 indicates a perfect positive correlation
-1 indicates a perfect negative correlation
0 indicates no correlation
• No strict rules for interpretation, however,
as a guideline, it is suggested:
0 < |𝑟| < 0.3: weak correlation
0.3 < |𝑟| < 0.7: moderate correlation
|𝑟| > 0.7: strong correlation
![Page 5: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/5.jpg)
Correlation
Snapshot from Multivariate Lecture 6
𝜌𝑋𝑌 is the correlation notation for the entire population.
Pearson correlation coefficient (𝑟) is for our sample representing
the population.
𝑟 = 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑥𝑖 − 𝑥 2 𝑦𝑖 − 𝑦 2
![Page 6: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/6.jpg)
Correlation calculation
Meal
Bill ($)
Tip ($)
Bill deviations
Tip deviations
Deviations products
Bill deviations squared
Tip deviations squared
𝑥 𝑦 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦 (𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦 ) 𝑥𝑖 − 𝑥 2 𝑦𝑖 − 𝑦 2
1 35 6 -37.5 -4 150 1406.25 16
2 110 18 37.5 8 300 1406.25 64
3 66 11 -6.5 1 -6.5 42.25 1
4 75 7 2.5 -3 -7.5 6.25 9
5 100 14 27.5 4 110 756.25 16
6 49 4 -23.5 -6 141 552.25 36
687 4169.5 142
𝑟 = 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑥𝑖 − 𝑥 2 𝑦𝑖 − 𝑦 2=
687
(4169.5)(142) = 0.892
![Page 7: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/7.jpg)
Correlation significance test (t-test)
• Is it statistically significant?
• Conduct a t-test
• 𝐻0: 𝜌 = 0 𝑣𝑠. 𝐻1: 𝜌 ≠ 0 𝑎𝑡 𝛼 = 0.05
• 𝑡 = 𝑟𝑛−2
1−𝑟2, df=n-2
• 𝑡 = 0.8926−2
1−0.8922= 3.947
𝑟 = 0.892
![Page 8: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/8.jpg)
Correlation significance test (t-test)
• 𝐻0: 𝜌 = 0 𝑣𝑠. 𝐻1: 𝜌 ≠ 0 𝑎𝑡 𝛼 = 0.05
• 𝑡 = 𝑟𝑛−2
1−𝑟2, df=n-2
• 𝑡 = 0.8926−2
1−0.8922= 3.947
• 𝑡𝑐𝑎𝑙𝑐 > 𝑡𝑐𝑟𝑖𝑡. −−→ 𝑟𝑒𝑗𝑒𝑐𝑡 𝑛𝑢𝑙𝑙
![Page 9: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/9.jpg)
SLR Lecture 1 Recap
![Page 10: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/10.jpg)
Recap - Quick Review
• SLR is a comparison of 2 models:
• One is where the independent variable does not exist
• And the other uses the best-fit regression line
• If there is only one variable, the best prediction for other
values is the mean of the dependent variable.
• The distance between the best-fit line and the observed
value is called residual (or error).
• The residuals are squared and added together to
generate sum of squares residuals/error (SSE).
• SLR is designed to find the best fitting line through the
data that minimizes the SSE.
![Page 11: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/11.jpg)
Recap - Example
0
2
4
6
8
10
12
14
16
18
20
0 1 2 3 4 5 6 7
Tip
($
)
Meal #
Tips for service ($)
𝑦 =10
Best-fit line
Meal # Tip ($)
1 6
2 18
3 11
4 7
5 14
6 4
![Page 12: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/12.jpg)
0
2
4
6
8
10
12
14
16
18
20
0 1 2 3 4 5 6 7
Tips for service ($)
16 1
16
64
9 36
Recap - Residuals (Errors)
+8
+1
−3
+4
−6 Squared Residuals (Errors)
# Residual Residual2
1 −4 16
2 +8 64
3 +1 1
4 −3 9
5 +4 16
6 −6 36
Sum of squared errors (SSE)
= 142
𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔𝟐 = 𝟏𝟒𝟐
−4
![Page 13: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/13.jpg)
Recap – Population vs. Sample Eq.
• If we knew our “population” parameters, 𝛽0, 𝛽1, then we could use the SLR eq. as is.
• In reality, we almost never have the population parameters. Therefore we have to estimate them using sample data. With sample data, SLR eq. changes a bit.
• Where 𝑦 “y-hat” is the point estimator of 𝐸 𝑦 .
• Or, 𝑦 is the mean value of 𝑦 for a given 𝑥.
𝐸 𝑦 = 𝛽0 + 𝛽1𝑥
𝑦 = 𝑏0 + 𝑏1𝑥
![Page 14: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/14.jpg)
Recap – OLS criterion
𝑦𝑖 = observed value of dependent variable (tip amount).
𝑦 𝑖 =estimated (predicted) value of the dependent variable
(predicted tip amount based on regression model).
min 𝑦𝑖 − 𝑦 𝑖2
0
5
10
15
20
0 50 100 150
observed
predicted
![Page 15: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/15.jpg)
Recap - SLR parameter equations
𝑦 𝑖 = 𝑏0 + 𝑏1𝑥
𝑏1 = 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑥𝑖 − 𝑥 2
slope
𝑥 = mean of the independent variable ($
bill)
𝑦 = mean of the dependent variable ($ tip)
𝑥𝑖 = value of the independent variable
𝑦𝑖 = value of the dependent variable
𝑏0 = 𝑦 − 𝑏1𝑥
intercept
![Page 16: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/16.jpg)
Recap - OLS Calculations
Meal Bill ($) Tip ($) Bill deviations
(𝑆𝑥) Tip deviations Deviations products
Bill deviations squared 𝑆𝑥
2
𝑥 𝑦 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦 (𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦 ) 𝑥𝑖 − 𝑥 2
1 35 6 -37.5 -4 150 1406.25
2 110 18 37.5 8 300 1406.25
3 66 11 -6.5 1 -6.5 42.25
4 75 7 2.5 -3 -7.5 6.25
5 100 14 27.5 4 110 756.25
6 49 4 -23.5 -6 141 552.25
𝑥 = 72.5 𝑦 = 10 687 4169.5
![Page 17: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/17.jpg)
Recap - OLS Calculations
Deviations products Bill deviations squared
(𝒙𝒊 − 𝒙 )(𝒚𝒊 − 𝒚 ) 𝒙𝒊 − 𝒙 𝟐
150 1406.25
300 1406.25
-6.5 42.25
-7.5 6.25
110 756.25
141 552.25
𝟔𝟖𝟕 𝟒𝟏𝟔𝟗. 𝟓
𝒃𝟏 = 𝒙𝒊 − 𝒙 𝒚𝒊 − 𝒚
𝒙𝒊 − 𝒙 𝟐
𝒃𝟏 =𝟔𝟖𝟕
𝟒𝟏𝟔𝟗. 𝟓
𝒃𝟏 = 𝟎. 𝟏𝟔𝟒𝟖
![Page 18: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/18.jpg)
Recap - OLS Calculations
𝒃𝟎 = 𝟏𝟎 − 𝟎. 𝟏𝟔𝟒𝟖(𝟕𝟐. 𝟓)
𝒃𝟏 = 𝟎. 𝟏𝟔𝟒𝟖
𝒃𝟎 = 𝒚 + 𝒃𝟏𝒙
Bill ($) Tip ($)
𝒙 𝒚
35 6
110 18
66 11
75 7
100 14
49 4
𝑥 = 72.5 𝑦 = 10
𝒃𝟎 = 𝟏𝟎 − 𝟏𝟏. 𝟗𝟒𝟓𝟕
𝒃𝟎 = −𝟏. 𝟗𝟒𝟓𝟕
![Page 19: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/19.jpg)
Recap – New Best-Fit Line & Parameters
𝑦 𝑖 = 𝑏0 + 𝑏1𝑥
𝑦 𝑖 = −1.9457 +0.1648𝑥
𝑏0 = −1.9457
intercept
𝑏1 = 0.1648
slope
𝑦 𝑖 = 0.1648𝑥 − 1.9457
OR
![Page 20: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/20.jpg)
Recap - Final SLR line
0
2
4
6
8
10
12
14
16
18
20
0 20 40 60 80 100 120
Tip
($
)
Bill ($)
Bill vs. Tip Amount ($)
𝒚 ̂_𝒊 =−𝟏.𝟗𝟒𝟓𝟕 +𝟎.𝟏𝟔𝟒𝟖𝒙
𝒃𝟎=−𝟏.𝟗𝟒𝟓𝟕
𝒔𝒍𝒐𝒑𝒆 𝒃𝟏 = 𝟎. 𝟏𝟔𝟒𝟖
![Page 21: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/21.jpg)
Recap - SLR Model Interpretation
𝑦 𝑖 = −1.9457 +0.1648𝑥
For every $1 the bill amount (𝑥) increases, we would expect the tip
amount to also increase by $0.1648 or
about 16 cents (positive coefficient).
If the bill amount (𝑥) is zero, then the
expected/predicted tip amount is $-
1.9457 or negative $1.95!
Does this make any sense? NO In real
world problems, the intercept may or
may not make sense.
![Page 22: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/22.jpg)
SLR – Lecture 2
![Page 23: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/23.jpg)
0
2
4
6
8
10
12
14
16
18
20
0 50 100 150
Bills vs Tips ($)
0
5
10
15
20
0 1 2 3 4 5 6 7
Tips ($)
Model fit and Coefficient of Determination
𝑺𝑺𝑬 = 𝟏𝟒𝟐
𝑺𝑺𝑬 = 𝑺𝑺𝑻
With only the DV, the only sum
of squares is due to error.
Therefore, it is also the total,
and MAX sum of squares for
this data sample. 𝑺𝑺𝑻 = 𝟏𝟒𝟐
With both the IV and DV, SST
remains the same. But the SSE
is reduced significantly. The
difference between the SSE
and SST is due to regression
(SSR).
𝑺𝑺𝑻 = 𝟏𝟒𝟐
𝑺𝑺𝑬 = ?
𝑺𝑺𝑻 − 𝑺𝑺𝑬 = 𝑺𝑺𝑹
![Page 24: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/24.jpg)
Estimate regression values
Meal Bill ($) Tip ($) 𝒚 𝒊 = −𝟏. 𝟗𝟒𝟓𝟕 +𝟎. 𝟏𝟔𝟒𝟖𝒙 𝒚 𝒊 (predicted tip $)
𝑥𝑖 𝑦𝑖
1 35 6 𝑦 𝑖 = −1.9457 +0.1648(35) 3.8212
2 110 18 𝑦 𝑖 = −1.9457 +0.1648(110) 16.1788
3 66 11 𝑦 𝑖 = −1.9457 +0.1648(66) 8.9290
4 75 7 𝑦 𝑖 = −1.9457 +0.1648(75) 10.4119
5 100 14 𝑦 𝑖 = −1.9457 +0.1648(100) 14.5311
6 49 4 𝑦 𝑖 = −1.9457 +0.1648(49) 6.1280
𝑥 = 72.5 𝑦 = 10
min 𝑦𝑖 − 𝑦 𝑖2
![Page 25: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/25.jpg)
Regression errors (residuals)
Meal Bill ($) Tip ($) 𝒚 𝒊 (predicted tip $) Error (𝒚 − 𝒚 𝒊)
𝑥 𝑦 (observed-predicted)
1 35 6 3.8212 6 − 3.8212 = 2.1788
2 110 18 16.1788 18 − 16.1788 = 1.8212
3 66 11 8.9290 11 − 8.9290 = 2.0710
4 75 7 10.4119 7 − 10.4119 = -3.4119
5 100 14 14.5311 14 − 14.5311 = -0.5311
6 49 4 6.1280 4 − 6.1280 = -2.1280
𝑥 = 72.5 𝑦 = 10
![Page 26: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/26.jpg)
Meal Bill ($) Tip ($) 𝒚 𝒊 (predicted tip $) Error (𝒚 − 𝒚 𝒊) (𝒚 − 𝒚 𝒊)𝟐
𝑥 𝑦
1 35 6 3.8212 2.1788 4.7472
2 110 18 16.1788 1.8212 3.3168
3 66 11 8.9290 2.0710 4.2890
4 75 7 10.4119 -3.4119 11.6412
5 100 14 14.5311 -0.5311 0.2821
6 49 4 6.1280 -2.1280 4.5282
Regression errors (residuals) - SSE
𝑥 = 72.5 𝑦 = 10 𝑆𝑆𝐸 = 28.8044
![Page 27: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/27.jpg)
SSE comparison
Sum of squared error (SSE) Comparison
D.V. (tip $) ONLY
+ + + + + = SSE = 28.8044
16 1 16 64 9 36 + + + + + = SSE = 142
D.V. & I.V (tip $ as a function of bill $)
![Page 28: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/28.jpg)
Comparison of two lines
• When we conducted the regression, the SSE decreased
from 142 to 28.8044.
• 28.8044 was explained by (allocated to) ERROR.
• What happen to the difference (113.1956)?
• 113.1956 is the sum of squares due to REGRESSION
(SSR).
• 𝑆𝑆𝑇 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸
• In this case:
142 = 113.1956 + 28.8044
![Page 29: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/29.jpg)
0
2
4
6
8
10
12
14
16
18
20
0 50 100 150
Bills vs Tips ($)
0
5
10
15
20
0 1 2 3 4 5 6 7
Tips ($)
Comparison of two lines
𝑺𝑺𝑬 = 𝟏𝟒𝟐
𝑺𝑺𝑬 = 𝑺𝑺𝑻
𝑺𝑺𝑻 = 𝟏𝟒𝟐
𝑺𝑺𝑻 = 𝟏𝟒𝟐
𝑺𝑺𝑬 = 𝟐𝟖. 𝟖𝟎𝟒𝟒
𝑺𝑺𝑻 − 𝑺𝑺𝑬 = 𝑺𝑺𝑹 = 𝟏𝟏𝟑. 𝟏𝟗𝟓𝟔
![Page 30: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/30.jpg)
Coefficient of Determination (𝑟2)
• How well does the estimated regression equation fit our
data?
• This is where regression starts to look a lot like ANOVA,
where the SST is partitioned into SSE & SSR.
• The larger the SSR the smaller the SSE.
• The Coefficient of Determination quantifies this ratio as a
percentage (%).
SSE
SST
SSR
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝐷𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 = 𝑟2 =𝑆𝑆𝑅
𝑆𝑆𝑇
![Page 31: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/31.jpg)
Coefficient of Determination (𝑟2)
• How well does the estimated regression equation fit our
data?
• This is where regression starts to look a lot like ANOVA,
where the SST is partitioned into SSE & SSR.
• The larger the SSR the smaller the SSE.
• The Coefficient of Determination quantifies this ratio as a
percentage (%).
SSE
SST
SSR
ANOVA
df SS MS F Significance F
Regression 1 113.1956 113.1956 15.7192 0.016611541
Residual 4 28.80441 7.201103
Total 5 142
![Page 32: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/32.jpg)
𝑟2 Interpretation
• 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝐷𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 = 𝑟2 =𝑆𝑆𝑅
𝑆𝑆𝑇
• 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝐷𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 = 𝑟2 =113.1956
142
• 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝐷𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 = 𝑟2 = 0.7972 𝑜𝑟 79.72%
• We can conclude that 79.72% of the total sum of squares
can be explained using the estimates from the regression
equation to predict the tip amount. And that the remainder
(20.28%) is error.
• This is a “Good fit”!
![Page 33: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/33.jpg)
0
2
4
6
8
10
12
14
16
18
20
30 40 50 60 70 80 90 100 110
Tip
($
)
Bill ($)
3 squared differences
𝒚 𝒊 = −𝟏. 𝟗𝟒𝟓𝟕 +𝟎. 𝟏𝟔𝟒𝟖𝒙
Bills vs. Tips ($)
𝒚 = 𝟏𝟎
SSE= (𝑦𝑖 − 𝑦 𝑖)2
SST= (𝑦𝑖 − 𝑦 )2
SSR= (𝑦 𝑖 − 𝑦 )2
![Page 34: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/34.jpg)
Model fit
𝑦 𝑖 = −1.9457 +0.1648𝑥
Questions:
• Once a regression line is calculated, how much better is it than only
using the mean of the dependent variable line alone? (coefficient of
determination (𝑟2)
• How confident are we in the significance of the relationship between x
and y? (t-test of slope)
![Page 35: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/35.jpg)
Regression with Excel
• Produce SLR model in Excel.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.892834
R Square 0.797152
Adjusted R Square 0.74644
Standard Error 2.683487
Observations 6
ANOVA
df SS MS F Significance F
Regression 1 113.1956 113.1956 15.7192 0.016611541
Residual 4 28.80441 7.201103
Total 5 142
Coefficien
ts Standard
Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -1.94568 3.205964 -0.60689 0.576683 -10.84685887 6.955504991 -10.84685887 6.955504991
X Variable 1 0.164768 0.041558 3.964745 0.016612 0.049383684 0.280152232 0.049383684 0.280152232
![Page 36: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/36.jpg)
Testing slope -1
• Is the relationship between 𝑦 and 𝑥 significant?
• Test the slope 𝛽1. (two-tailed t-test)
• Remember 𝑏1is for our sample and 𝛽1 is for the population
• We will use our sample slope 𝑏1 to test if the true slope of
the population 𝛽1 is significantly different than 0.
𝑦 𝑖 = −1.9457 +0.1648𝑥
![Page 37: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/37.jpg)
Testing slope -2
Steps to conduct a t-test on slope 𝛽1:
• Step 1: Specify hypothesis:
• 𝐻0: 𝛽1 = 0 𝑣𝑠. 𝐻1: 𝛽1 ≠ 0 𝑎𝑡 𝛼 = 0.05
• Step 2: Determine the test statistic:
𝑡 =𝑏1−𝛽1
𝑆𝐸𝑏1
• where 𝛽1 is true coefficient for all population
• where 𝑆𝐸𝑏1 =𝑆𝑆𝐸𝑛−2
(𝑥−𝑥 )2
= standard error of the slope 𝑏1
![Page 38: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/38.jpg)
Testing slope -3
• Step 2 calculation:
• 𝑆𝐸𝑏1 =𝑆𝑆𝐸𝑛−2
(𝑥−𝑥 )2
=28.8044(6−2)
4169.5
= 0.0416
• 𝑡 =𝑏1−𝛽1
𝑆𝐸𝑏1=
0.1648−0
0.0416= 3.9615
• Step 3: Quantify the evidence of the test
• Method 1: Critical value method
• Compare calculated t to critical t
• ±𝑡1−𝛼
2,𝑛−2 = ±𝑡0.975,4
𝑦 𝑖 = −1.9457 +0.1648𝑥
![Page 39: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/39.jpg)
Testing slope -4
• Step 3: Quantify the evidence of the test
• Method 1: Critical value method
• Compare calculated 𝑡 to critical 𝑡 (remember 𝛼 = 0.05)
• ±𝑡1−𝛼
2,𝑛−2 = ±𝑡0.975,4 = 2.776
![Page 40: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/40.jpg)
Testing slope -5
• Step 3: Method 1: Critical value method
• Compare calculated 𝑡 to critical 𝑡 (remember 𝛼 = 0.05)
• 𝑡𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 3.9615 > 𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 2.776
• T calc is in the critical region so Reject null hypothesis 𝐻0: 𝛽1 = 0
meaning that our 𝛽1 ≠ 0 and we do have a statistically significant
relationship between 𝑥 and 𝑦. .
0.95
0.025 0.025
![Page 41: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/41.jpg)
Testing slope -6
• Step 3: Method 2: p-value method
• Compare calculated/estimated 𝑝 value to desired significance
level. (remember 𝛼 = 0.05)
• 𝑝𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑/𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 = 2𝑝 𝑡 > 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 𝑡 = 2𝑝(𝑡 > 3.9615) ≈
0.03
• 𝑝 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 0.03 < 𝛼 = 0.05, therefore reject null hypothesis
𝐻0: 𝛽1 = 0 meaning that our 𝛽1 ≠ 0 and we do have a statistically
significant relationship between 𝑥 and 𝑦. .
![Page 42: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/42.jpg)
SLR Example with R
• Start R session
• Import dataset “airquality” included in R base
• Explore and plot data
• Run a simple linear regression model with
“Ozone” as a DV (𝑦)
“Temp” as an IV (𝑥)
• Follow in R session and model results are as follows:
![Page 43: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/43.jpg)
SLR Example with R
• Dataset = airquality ----> 153 obs. of 6 variables
• Start R session and follow instructions in code
• Use simple linear regression to predict ozone levels “Ozone” based on the
temperature “Temp”.
ID Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9
10 NA 194 8.6 69 5 10
![Page 44: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/44.jpg)
Step 1: scatter plot
Ozone Temp
41 67
36 72
12 74
18 62
NA 56
28 66
23 65
19 59
8 61
NA 69
![Page 45: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/45.jpg)
STEP 3: CORRELATION (Ozone vs Temp)
• What is the correlation coefficient (r) for Ozone vs. Temp? (see R session)
In this case, 𝑟 = .698
• Is the relationship strong?
MODERATE! --------> RUN MODEL see R session
![Page 46: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/46.jpg)
Model results (model m1)
• 𝑦 = 𝛽0 + 𝛽1𝑥
• 𝛽0 = −146.996 (Intercept) 𝛽1 = +2.429 (Slope)
• Regression line for this model ---> 𝑦 = −146.996 +2.429(𝑥)
![Page 47: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/47.jpg)
Results interpretation (model m1) -1
Residuals:
• Residuals are the differences between the actual observed response values
(distance to Ozone levels in our case) and the response values that the
model predicted.
• The “Residuals” section of the model output breaks it down into 5 summary
points to assess how well the model fit the data.
• A good fit model will show symmetry from the min to max around the mean
value (0).
• We do not have a very good symmetry here.
• So, the model is predicting certain points that fall far away from the actual
observed points.
![Page 48: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/48.jpg)
Results interpretation (model m1) -2
Model Coefficients:
• 𝛽0 = −146.996 (𝑦 − 𝐼𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡)
No interpretational meaning; but it is the Ozone level value when Temp = 0
• 𝛽1 = +2.429 (𝑆𝑙𝑜𝑝𝑒)
For every 1 degree ℉ the temperature increases (𝑥), it is expected that the
Ozone level to also increase by 2.429 units.
• 𝑠𝑡𝑑. 𝑒𝑟𝑟𝑜𝑟 = 0.2331
We can say that Ozone level/units can vary by 0.2331.
• t-value for “Temp” = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
𝑠𝑡𝑑. 𝑒𝑟𝑟𝑜𝑟 =
2.429
0.233 = 10.418
t-value is significant Pr (> |𝑡|) = 2𝑒−16 ; which is significant at any level of
significance (you could say at 99.99% level of confidence or 0.001).
![Page 49: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/49.jpg)
Results interpretation (model m1) -3
• Residual Standard Error = 23.71 on 114 degrees of freedom
• The Residual Standard Error is the average amount that the response
“Ozone” will deviate from the true regression line.
• In our example, the actual Ozone level can deviate from the true regression
line by approximately 23.71 units, on average.
• Degrees of freedom are the actual number of data points (observations)
minus 2 (taking into account the parameters for the “intercept” and the
“Ozone” variables).
So, we started the model with 153 data point in the “airquality” dataset
We removed 37 data points that were N/A’s
We are left with 116 data points
116 data points will lead to (116-2 parameters) = 114 DF
![Page 50: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/50.jpg)
Results interpretation (model m1) -4
• 𝑅-squared = 0.4877 (𝑅2 = coefficient of determination)
𝑅2 varies from 0 𝑡𝑜 1; in this case, 48.77% of (𝑦) is explained by (𝑥)
• Adjusted 𝑅2 = 0.4832
Adjusted 𝑅2 accounts for how many independent variables entered the
model. Typically lower than 𝑅2 based on how much contribution
additional independent variables (𝑥’𝑠)added to explaining (𝑦)
A sharp drop in the adjusted 𝑅2 versus 𝑅2 indicates a bad model.
𝑭-Test (F-value is used for measuring the overall model significance).
• At the desired level of significance (say 95%), the statistical significance of
the 𝐹-test will show how good of a model this is.
• In this model, the 𝐹-statistic = 108.5 on 1 variable with 114
• The 𝐹-statistic level of significance is Pr (> 𝐹) = 2.2𝑒−16; that is the 𝐹-statistic
is significant at any reasonable level of significance (or you could say @
99.99%).
![Page 51: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on](https://reader035.vdocuments.net/reader035/viewer/2022081403/60a94bcec071462a5d124688/html5/thumbnails/51.jpg)
SLR – R code