146 41 multiple_regression
TRANSCRIPT
![Page 1: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/1.jpg)
MATH& 146
Lesson 41
Section 6.1
Multiple Regression
1
![Page 2: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/2.jpg)
Multiple Regression
The principles of simple linear regression lay the foundation for more sophisticated regression methods used in a wide range of challenging settings.
Multiple regression extends simple two-variable regression to the case that still has one response but many predictors (denoted x1, x2, x3, ...). The method is motivated by scenarios where many variables may be simultaneously connected to an output.
2
![Page 3: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/3.jpg)
Mario Kart
We will consider eBay auctions of a video game
called Mario Kart for the Nintendo Wii. The
outcome variable of interest is the total price of an
auction, which is the highest bid plus the shipping
cost.
We will try to determine how total price is related to
each characteristic in an auction while
simultaneously controlling for other variables.
3
![Page 4: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/4.jpg)
Mario Kart
For instance, all other characteristics held
constant, are longer auctions associated with
higher or lower prices?
And, on average, how much more do buyers tend
to pay for additional Wii wheels (plastic steering
wheels that attach to the Wii controller) in
auctions?
Multiple regression can help us answer these and
other questions.
4
![Page 5: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/5.jpg)
Mario Kart
The Mario Kart data set includes results from 141
auctions. Four observations from this data set are
shown in the table below.
5
price cond_new stock_photo duration wheels
1 51.55 1 1 3 1
2 37.04 0 1 7 1
… … … … … …
140 38.76 0 0 7 0
141 54.51 1 1 1 2
![Page 6: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/6.jpg)
Mario Kart
Descriptions for each variable are shown below.
Notice that the cond_new and stock_photo
variables are indicator variables (variables that
only have values of 0 or 1).
6
![Page 7: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/7.jpg)
Mario Kart
For instance, the cond_new variable takes value 1
if the game up for auction is new and 0 if it is used.
Using indicator variables in place of category
names allows for these variables to be directly
used in regression.
7
price cond_new stock_photo duration wheels
1 51.55 1 1 3 1
2 37.04 0 1 7 1
… … … … … …
140 38.76 0 0 7 0
141 54.51 1 1 1 2
![Page 8: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/8.jpg)
Mario Kart
Multiple regression also allows for categorical
variables with many levels, though we do not have
any such variables in this analysis, and we save
these details for a second or third course.
8
price cond_new stock_photo duration wheels
1 51.55 1 1 3 1
2 37.04 0 1 7 1
… … … … … …
140 38.76 0 0 7 0
141 54.51 1 1 1 2
![Page 9: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/9.jpg)
Mario Kart
Let's fit a linear regression model with the game's
condition as a predictor of auction price. The model
may be written as
Results of this model are shown below.
9
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.8711 0.8140 52.67 0.0000
cond_new 10.8996 1.2583 8.66 0.0000
df = 139
![Page 10: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/10.jpg)
Mario Kart
A scatterplot of the total auction price against the
game's condition is shown, which takes value 1 when
the game is new and 0 if the game is used. The least
squares line is also shown.
10
![Page 11: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/11.jpg)
Example 1
Does the linear model seem reasonable?
11
![Page 12: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/12.jpg)
Example 2
Interpret the coefficient for the game's condition in the
model. Is this coefficient significantly different from 0?
12
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.8711 0.8140 52.67 0.0000
cond_new 10.8996 1.2583 8.66 0.0000
df = 139
![Page 13: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/13.jpg)
Multiple Regression
Sometimes there are underlying structures or
relationships between predictor variables. For
instance, new games sold on eBay tend to come
with more Wii wheels, which may have led to
higher prices for those auctions.
We would like to fit a model that includes all
potentially important variables simultaneously.
13
![Page 14: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/14.jpg)
Multiple Regression
This would help us evaluate the relationship
between a predictor variable and the outcome
while controlling for the potential influence of other
variables. This is the strategy used in multiple
regression.
While we remain cautious about making any
causal interpretations using multiple regression,
such models are a common first step in providing
evidence of a causal connection.
14
![Page 15: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/15.jpg)
Multiple Regression
We want to construct a model that accounts for not
only the game condition, but simultaneously
accounts for three other variables: stock_photo,
duration, and wheels.
15
price cond_new stock_photo duration wheels
1 51.55 1 1 3 1
2 37.04 0 1 7 1
… … … … … …
140 38.76 0 0 7 0
141 54.51 1 1 1 2
![Page 16: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/16.jpg)
Multiple Regression
That is,
In this equation, y represents the total price, x1
indicates whether the game was new, x2 indicates
whether a stock photo was used, x3 was the duration
of the auction, and x4 was the number of Wii wheels
included with the game.
16
![Page 17: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/17.jpg)
Multiple Regression
We (by that, I mean computers) estimate the
parameters β0, β1, β2, β3, and β4 in the same way
as we did in the case of a single predictor.
We (again, computers) select b0, b1, b2, b3, and b4
that minimize the sum of the squared residuals:
17
141 141
22 2 2 2
1 2 141
1 1
ˆi i i
i i
SSE e e e e y y
![Page 18: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/18.jpg)
Multiple Regression
There are 141 residuals for the Mario Kart data, one
for each auction. A computer minimizes the sum of
these squared residuals, and computes point
estimates, shown in the regression table below.
18
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.2110 1.5140 23.92 0.0000
cond_new 5.1306 1.0511 4.88 0.0000
stock_photo 1.0803 1.0568 1.02 0.3085
duration –0.0268 0.1904 –0.14 0.8882
wheels 7.2852 0.5547 13.13 0.0000
df = 136
![Page 19: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/19.jpg)
Example 3
Write out the model using the point estimates below.
How many predictors are there in this model?
19
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.2110 1.5140 23.92 0.0000
cond_new 5.1306 1.0511 4.88 0.0000
stock_photo 1.0803 1.0568 1.02 0.3085
duration –0.0268 0.1904 –0.14 0.8882
wheels 7.2852 0.5547 13.13 0.0000
df = 136
![Page 20: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/20.jpg)
Example 4
What does β4, the coefficient of variable x4 (Wii
wheels), represent? What is the point estimate of β4?
20
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.2110 1.5140 23.92 0.0000
cond_new 5.1306 1.0511 4.88 0.0000
stock_photo 1.0803 1.0568 1.02 0.3085
duration –0.0268 0.1904 –0.14 0.8882
wheels 7.2852 0.5547 13.13 0.0000
df = 136
![Page 21: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/21.jpg)
Example 5
Compute the residual of the first observation using
the multiple regression equation identified in the
previous examples.
21
price cond_new stock_photo duration wheels
1 51.55 1 1 3 1
2 37.04 0 1 7 1
… … … … … …
140 38.76 0 0 7 0
141 54.51 1 1 1 2
![Page 22: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/22.jpg)
Example 6
What do the two models below have in common?
Where do they differ?
22
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.8711 0.8140 52.67 0.0000
cond_new 10.8996 1.2583 8.66 0.0000
df = 139
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.2110 1.5140 23.92 0.0000
cond_new 5.1306 1.0511 4.88 0.0000
stock_photo 1.0803 1.0568 1.02 0.3085
duration –0.0268 0.1904 –0.14 0.8882
wheels 7.2852 0.5547 13.13 0.0000
df = 136
![Page 23: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/23.jpg)
Example 7
We estimated a coefficient for cond_new of b1 = 10.90
with a standard error of SE(b1) = 1.26 when using
simple linear regression. Why might there be a
difference between that estimate and the one in the
multiple regression setting?
23
1 1
simple linear regression 10.90 1.26
multiple regression 5.13 1.05
b SE b
![Page 24: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/24.jpg)
Example 7 answer
When we estimated the connection of the outcome
price and predictor cond_new using simple linear
regression, we were unable to control for other
variables like the number of Wii wheels included in the
auction.
24
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.2110 1.5140 23.92 0.0000
cond_new 5.1306 1.0511 4.88 0.0000
stock_photo 1.0803 1.0568 1.02 0.3085
duration –0.0268 0.1904 –0.14 0.8882
wheels 7.2852 0.5547 13.13 0.0000
df = 136
![Page 25: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/25.jpg)
Example 7 answer
We say that the simple linear model was biased by the
confounding variable wheels. When we use both
variables, this particular underlying and unintentional
bias is reduced or eliminated (though bias from other
confounding variables may still remain).
25
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.2110 1.5140 23.92 0.0000
cond_new 5.1306 1.0511 4.88 0.0000
stock_photo 1.0803 1.0568 1.02 0.3085
duration –0.0268 0.1904 –0.14 0.8882
wheels 7.2852 0.5547 13.13 0.0000
df = 136
![Page 26: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/26.jpg)
Collinearity
The previous example describes a common issue in
multiple regression: correlation among predictor
variables. We say the two predictor variables are
collinear when they are correlated, and this
collinearity complicates model estimation.
While it is impossible to prevent collinearity from
arising in observational data, experiments are usually
designed to prevent predictors from being collinear.
26
![Page 27: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/27.jpg)
Example 8
The estimated value of the intercept is 36.21, and one
might be tempted to make some interpretation of this
coefficient, such as, it is the model's predicted price
when each of the variables take value zero: the game
is used, the primary image is not a stock photo, the
auction duration is zero days, and there are no wheels
included.
Is there any value gained by making this
interpretation?
27
![Page 28: 146 41 multiple_regression](https://reader031.vdocuments.net/reader031/viewer/2022030211/58a3c1151a28ab62218b6973/html5/thumbnails/28.jpg)
Example 8 answer
Three of the variables (cond_new, stock photo,
and wheels) do take value 0, but the auction
duration is always one or more days. If the auction
is not up for any days, then no one can bid on it!
That means the total auction price would always
be zero for such an auction; the interpretation of
the intercept in this setting is not insightful.
28