bms2024-multiple linear regression-1 lesson (1)
TRANSCRIPT
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
1/37
ADVANCED MANAGERIAL
STATISTICS
Multiple Linear Regression
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
2/37
Objectives
apply multiple regression analysis to businessdecision-making situations
analyze and interpret the computer output for a
multiple regression model test the significance of the multiple regression
model
test the significance of the independentvariables in a multiple regression model
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
3/37
Recap: Simple Linear Regression
What is regression analysis?
What does it mean by linear relationship?
What are dependent and independent(predictor) variables?
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
4/37
The Multiple Regression Model
Idea: Examine the linear relationship between1 dependent (Y) & 2 or more independent variables (Xi)
XXXYkik2i21i10i
Multiple Regression Model with kIndependent Variables:
Y-intercept Population slopes Random Error
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
5/37
Multiple Regression Equation
The coefficients of the multiple regression model areestimated using sample data
kik2i21i10i XbXbXbbY
Estimated(or predicted)value of Y
Estimated slope coefficients
Multiple regression equation with kindependent variables:
Estimatedintercept
In this chapter, we will always use Excel to obtain theregression slope coefficients and other regression
summary measures.
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
6/37
Example:2 Independent Variables
A distributor of frozen desert pies wants to evaluatefactors thought that influence demand
Dependent variable: Pie sales (units per week) Independent variables: Price (in $)
Advertising ($100s)
Data are collected for 15 weeks
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
7/37
Pie Sales Example
Sales = b0+ b1(Price)
+ b2(Advertising)
Week
Pie
Sales
Price
($)
Advertising
($100s)
1 350 5.50 3.3
2 460 7.50 3.3
3 350 8.00 3.0
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.515 300 7.00 2.7
Multiple regression equation:
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
8/37
Estimating a Multiple LinearRegression Equation
Excel will be used to generate thecoefficients and measures of goodness of
fit for multiple regression Excel:
Data / Data Analysis... / Regression
Instructions are attached here.
http://localhost/var/www/apps/conversion/tmp/scratch_8/Excel%20Tips%20on%20Regression%20Analysis-2013.docxhttp://localhost/var/www/apps/conversion/tmp/scratch_8/Excel%20Tips%20on%20Regression%20Analysis-2013.docx -
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
9/37
Multiple Linear Regression: ExcelSummary Output
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
10/37
Multiple Regression Excel Output
Regression Statist ics
Multiple R 0.72213
R Square 0.52148
Adjusted RSquare 0.44172
Standard Error 47.46341
Observations 15
ANOVA d f SS MS F Signi f icance FRegression 2 29460.027 14730.013 6.53861 0.01201
Residual (Error) 12 27033.306 2252.776
Total 14 56493.333
Coeff ic ien
tsStandard
Error t Stat P-value Low er 95% Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
ertising)74.131(Advce)24.975(Pri-306.526Sales
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
11/37
The Multiple Regression Equation
ertising)74.131(Advce)24.975(Pri-306.526Sales
b1 = -24.975: saleswill decrease, onaverage, by 24.975
pies per week foreach $1 increase inselling price, holdingadvertising constant
b2= 74.131:sales willincrease, on average,by 74.131 pies per
week for each $100increase inadvertising, holdingprice constant
whereSales is in number of pies per week
Price is in $Advertising is in $100s.
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
12/37
Using The Equation to MakePredictions
Predict sales for a week in which the sellingprice is $5.50 and advertising is $350:
Predicted salesis 428.62 pies
428.62
(3.5)74.131(5.50)24.975-306.526
ertising)74.131(Advce)24.975(Pri-306.526Sales
Note that Advertising isin $100s, so $350means that X2= 3.5
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
13/37
Measures of Variation
Total variation is made up of two parts:
SSESSRSST Total Sum ofSquares
Regression Sumof Squares
Error Sum ofSquares
2
i )YY(SST 2
ii )Y
Y(SSE 2
i )YY
(SSRwhere:
= Average value of the dependent variable
Yi= Observed values of the dependent variable
i = Predicted value of Y for the given XivalueY
Y
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
14/37
SST = total sum of squares
Measures the variation of the Yivalues around theirmean Y
SSR = regression sum of squares
Explained variation attributable to the relationshipbetween X and Y
SSE = error sum of squares
Variation attributable to factors other than therelationship between X and Y
Measures of Variation
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
15/37
Xi
Y
X
Yi
SSTSSE
SSR _
Y
Y
Y
_Y
Measures of Variation
Regression Line
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
16/37
The coefficient of determinationis the portionof the total variation in the dependentvariable that is explained by variation in the
independent variable The coefficient of determination is also called
R-squaredand is denoted as R2
Coefficient of Determination, R2
1R0 2 note:
squaresofsumsquaresofregression2
totalsum
SSTSSRR
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
17/37
Composition of Total Variation
TotalVariation
ExplainedVariation
UnexplainedVariation
SST
SSRegression
(SSR)
SSResidual / SSError
(SSE)
=
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
18/37
R2= 1
How Strong is The Model?
Y
X
Y
X
R2= 1
R2= 1
Perfect linear relationshipbetween X and Y:
100% of the variation in Y is
explained by variation in X
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
19/37
Y
X
Y
X
0 < R2< 1
Weaker linear relationshipsbetween X and Y:
Some but not all of the
variation in Y is explainedby variation in X
How Strong is The Model?
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
20/37
R2= 0
No linear relationship betweenX and Y:
The value of Y does not
depend on X. (None of thevariation in Y is explained byvariation in X)
Y
XR2= 0
How Strong is The Model?
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
21/37
Regression Statist ics
Multiple R 0.72213
R Square 0.52148
Adjusted RSquare 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS FSigni f icance
F
Regression 2 29460.03 14730.02 6.539 0.01201
Residual (Error) 12 27033.31 2252.78
Total 14 56493.34
Coeff ic ien
tsStandard
Error t Stat P-value Low er 95% Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.5214856493.3
29460.0
SST
SSR
R
2
52.1% of the variation in pie sales isexplained by the variation in priceand advertising. 47.9% is theunexplained variation.
Coefficient of Determination
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
22/37
Adjusted Coefficient of Determination (R2adj)
R2 never decreases when a new Xvariable isadded to the model
This can be a disadvantage when comparingmodels
What is the net effect of adding a new variable?
We lose a degree of freedom when a new Xvariable is added
Did the new X variable add enoughexplanatory power to offset the loss of onedegree of freedom?
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
23/37
Shows the proportion of variation in Y explainedby allXvariables adjusted for the number of Xvariables used
(where n= sample size,p= number of independentvariables)
Penalize excessive use of unimportant independentvariables
Smaller than R2
Useful in comparing among models
Adjusted R2
1
1)1(1 22
pn
nRRadj
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
24/37
Using the pie sales example:
(where n= sample size,p= number of independent variables)
Adjusted R2 (computation)
4417.01215
115)5215.01(1
1
1)1(1 22
pn
nRRadj
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
25/37
Regression Statist ics
Multiple R 0.72213
R Square 0.52148
Adjusted RSquare 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS FSigni f icance
F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual (Error) 12 27033.306 2252.776
Total 14 56493.333
Coeff ic ien
tsStandard
Error t Stat P-value Low er 95% Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.44172R2adj
44.2% of the variation in pie sales isexplained by the variation in price andadvertising, taking into account the samplesize and number of independent variables
Adjusted R2
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
26/37
Is the Model Significant?
F-Test for Overall Significance of the Model
Shows if there is a linear relationship between all of the
X variables considered together and Y
Use F-test statistic
Hypotheses:
H0: 1= 2= = k= 0 (no linear relationship)
H1: At least one i 0 (at least one independentvariable affects Y)
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
27/37
F-Test for Overall Significance
Test statistic:
=
= ()
()
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
28/37
F-distribution
Like t-distribution, the shape of F-distributioncurve depends on the number of degrees offreedom (df).
It has two degrees of freedom (i.e. dfnumerator &
dfdenominator).
It is right skewed but skewness decreases as the dfincreases.
Characteristics:
The F-distribution is continuous and skewed to the right
The units of an F-distribution are nonnegative.
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
29/37
Critical Value: F-distribution
= 0.05
F,d f1,d f2
Rejection Region
NonRejectionRegion
Degree of freedom (df1) = p
Degree of freedom (df2) = (n p 1) where p = number of predictors
F0
Decision Rule:
If F test statistic > F,df
or
p-value < = 0.05
So, reject H0
otherwise
Do Not Reject H0
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
30/37
6.53862252.8
14730.0
MSE
MSRF
Regression Statist ics
Multiple R 0.72213
R Square 0.52148
Adjusted RSquare 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS FSigni f ican
ce F
Regression 2 29460.027 14730.013 6.53861 1.20E-02
Error 12 27033.306 2252.776
Total 14 56493.333
Coeff ic ient
sStandard
Error t Stat P-value Low er 95% Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
F-Test for Overall Significance
With 2 and 12 degreesof freedom
p-value forthe F-Test= 0.012
-
8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)
31/37
H0: 1= 2= 0
H1: 1and 2not both zero
= .05
df1= 2 df2= 12
Test Statistic:
Decision:
Conclusion:
Since F test statistic is in therejection region (p-value