2 - multiple regression models
DESCRIPTION
Multiple Regression models.pptTRANSCRIPT
Types of regression models
Regression Models
Simple Multiple
2° order
1° order
2° order
1° order
Interaction
Higher order Higher order
A quadratic second order model
E(Y)=β0+ β1x+ β2 x2
• Interpretation of model parameters:
• β0: y-intercept. The value of E(Y) when x1 = x2 = 0
• β1 : is the shift parameter;
• β2 : is the rate of curvature;
Example with quadratic terms
2.00 4.00 6.00 8.00 10.00
x
0.00
25.00
50.00
75.00
100.00
The true model, supposedly unknown, is
Yi = 2 + xi2 + εi, with εi~N(0,2)
Data: (x,y). See SQM.sav
Model 1: E(Y) = β0 + β1x
Model Summary
,973a ,947 ,947 6,60994Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), xa. ANOVAb
80624,915 1 80624,915 1845,332 ,000a
4500,202 103 43,691
85125,117 104
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), xa.
Dependent Variable: yb. Coefficientsa
-19,959 1,483 -13,454 ,000
10,744 ,250 ,973 42,957 ,000
(Constant)
x
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: ya.
Linear Regression
2.00 4.00 6.00 8.00 10.00
x
0.00
25.00
50.00
75.00
100.00
y = -19.96 + 10.74 * xR-Square = 0.95
Model 2: E(Y) = β0 + β1x2
Model Summary
,996a ,991 ,991 2,68707Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), XSquarea. ANOVAb
84381,422 1 84381,422 11686,632 ,000a
743,695 103 7,220
85125,117 104
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), XSquarea.
Dependent Variable: yb.
Coefficientsa
2,340 ,417 5,608 ,000
,997 ,009 ,996 108,105 ,000
(Constant)
XSquare
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: ya.
Smaller variance and SE
Linear Regression
0.00 25.00 50.00 75.00 100.00
XSquare
0.00
25.00
50.00
75.00
100.00
y = 2.34 + 1.00 * XSquareR-Square = 0.99
Model 3: E(Y) = β0 + β1x + β2x2
Model Summary
.996a .991 .991 2.66608Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), XSquare, xa. ANOVAb
84400.103 2 42200.052 5936.999 .000a
725.014 102 7.108
85125.117 104
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), XSquare, xa.
Dependent Variable: yb. Coefficientsa
4.177 1.206 3.463 .001
-.830 .512 -.075 -1.621 .108
1.071 .046 1.069 23.046 .000
(Constant)
x
XSquare
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: ya.
Types of regression models
Regression Models
Simple Multiple
2° order
1° order
2° order
1° order
Interaction
Higher order Higher order
Y
X1
Y
X1
Y
X1
Y
X1
3 < 0
3 > 0
A third order model with 1 IV
E(Y)=β0+ β1x+ β2 x2+ β3 x3
Use with caution given numerical problems that
could arise
Types of regression models
Regression Models
Simple Multiple
2° order
1° order
2° order
1° order
Interaction
Higher order Higher order
First-Order model in k Quantitative variables
E(Y)=β0+β1x1+β2 x2 + ... + βk xk
Interpretation of model parameters:
β0: y-intercept. The value of E(Y) when x1 = x2 =...= xk= 0
β1: change in E(Y) for a 1-unit increase in x1 when x2,.., xk are held fixed;
β2: change in E(Y) for a 1-unit increase in x2 when x1, x3,..., xk are held fixed;
...
A bivariate model
Changing x2 changes only the y-intercept.
E(Y)=β0+β1x1+β2 x2
In the first order model a 1-unit change in one independent variable will have the same effect on the mean value of y regardless of the other independent variables.
X2
Y
X1E (Y ) = 0 + 1X 1i + 2X 2i
0
Y i = 0 + 1X 1i + 2X 2i + i
Resp on seP la ne
(X 1i,X 2i)
(Observed Y )
iX2
Y
X1E (Y ) = 0 + 1X 1i + 2X 2i
0
Y i = 0 + 1X 1i + 2X 2i + i
Resp on seP la ne
(X 1i,X 2i)
(Observed Y )
i
A bivariate model
Example: executive salaries
• Y = Annual salary (in dollars)• x1 = Years of experience• x2 = Years of education• x3 = Gender : 1 if male; 0 if female• x4 = Number of employees supervised• x5 = Corporate assets (in millions of dollars)
Data: ExecSal.sav
E(Y)=β0+ β1x1+ β2 x2 + β4 x4 + β5 x5
Do not consider x3
(Gender) for the moment
Exsecutive salaries: Computer Output
Riepilogo del modello
Modello
R R-quadratoR-quadrato
correttoDeviazione standard Errore
della stima,870a ,757 ,747 12685,309
a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Number of Employees supervised
Riepilogo del modelloModello
R R-quadratoR-quadrato
corretto
Deviazione standard Errore
della stima
dimension0
1
,783a ,613 ,609 15760,006a. Predittori: (Costante), Years of Experience
Simple regressionMultiple regression
SST
SSE
SST
SSRR 1
variationTotal
variationExplained2
Coefficient of determination
2
1
2
1
2
1
)ˆ()ˆ()( i
n
ii
n
ii
n
ii yyyyyy
The coefficient R2 is computed exactly as in the simple regression case.
SSE (Error)SSR (Regression)
SST (Total)
A drawback of R2: it increases with the number of added variables, even if these are NOT relevant to
the problem.
A solution: Adjusted R2
– Each additional variable reduces adjusted R2, unless SSE varies enough to compensate
22 1
1
11 R
SST
SSE
SST
SSE
kn
nRa
Adjusted R2 and estimate of the variance σ2
1ˆ
1
22
knknSSE
s i
An unbiased estimator of the variance σ 2 is computed as
Coefficientia
Model
Coefficienti non standardizzati
Coefficienti standardizz
ati
t Sig.B
Deviazione standard
Errore Beta1
(Costante) -37082,148 17052,089 -2,175 ,032
Years of Experience
2696,360 173,647 ,785 15,528 ,000
Years of Education
2656,017 563,476 ,243 4,714 ,000
Number of Employees supervised
41,092 7,807 ,272 5,264 ,000
Corporate assets (in million $)
244,569 83,420 ,149 2,932 ,004
Variabile dipendente: Annual salary in $
Exsecutive salaries: Computer Output (2)
Variables
T-tests
• 1. Shows If There Is a Linear Relationship Between All X Variables Together & Y
• 2. Uses F Test Statistic
• 3. Hypotheses
– H0: 1 = 2 = ... = k = 0
•No Linear Relationship
– Ha: At Least One Coefficient Is Not 0
•At Least One X Variable Affects Y
The F-test for 1 single coefficient is equivalent to the t-test
Testing overall significance: the F-test
Anova table
Anovab
ModelloSomma dei
quadrati dfMedia dei quadrati F Sig.
1Regressione 4,766E10 4 1,192E10 74,045 ,000a
Residuo 1,529E10 95 1,609E8
Totale 6,295E10 99a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Number of Employees supervised
b. Variabile dipendente: Annual salary in $
F-statistic
MSE (mean square error),
the estimate of variance
df = k: number of regression slopes df = n-1: n=
number of observations
p-vale of F-test
Decision: reject H0, i.e. accept
this model
Interaction (second order) model
E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2
• Interpretation of model parameters:
• β0: y-intercept. The value of E(Y) when x1 = x2 = 0
• β1+ β3 x2 : change in E(Y) for a 1-unit increase in x1 when x2 is held fixed;
• β2 + β3 x1 : change in E(Y) for a 1-unit increase in x2 when x1 is held fixed;
• β3: controls the rate of change of the surface.
Interaction (second order) model
Contour lines are not parallel
E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2
The effect of one variable depends on the level of the other
Example: Antique grandfather clocks auction
Clocks are sold at an auction on competitive offers. Data are:– Y : auction price in dollars
– X1: age of clocks
– X2: number of bidders
Model 1: E(Y) = β0 + β1x1 + β2x2
Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2
Data: GFCLOCKS.sav
Data summaries
Descriptive Statistics
32 108 194 144.94 27.395 .216 .414 -1.323 .809
32 5 15 9.53 2.840 .420 .414 -.788 .809
32 729 2131 1326.88 393.487 .396 .414 -.727 .809
32
Age
Bidders
Price
Valid N (listwise)
Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error
N Minimum
Maximum
Mean Std.Deviatio
n
Skewness Kurtosis
If data are Normal Skewness is 0
If data are Normal (eccess) Kurtosis is 0Note: Skewness and Kurtosis are not enough to establish Normality
P-P plot for Normality
If data are Normal. Points should be along the straight line.
In this example the situation is fairly good
Bivariate scatter-plots
120 140 160 180
Age
800
1200
1600
2000
6 8 10 12 14
Bidders
800
1200
1600
2000
Model 1: E(Y) = β0 + β1x1 + β2x2
Model Summary
.945a .892 .885 133.485Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Bidders, Agea. ANOVAb
4283062.960 2 2141531.480 120.188 .000a
516726.540 29 17818.157
4799789.500 31
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Bidders, Agea.
Dependent Variable: Priceb. Coefficientsa
-1338.951 173.809 -7.704 .000
12.741 .905 .887 14.082 .000
85.953 8.729 .620 9.847 .000
(Constant)
Age
Bidders
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Pricea.
Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2
Model Summary
.977a .954 .949 88.915Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), AgeBid, Age, Biddersa. ANOVAb
4578427.367 3 1526142.456 193.041 .000a
221362.133 28 7905.790
4799789.500 31
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), AgeBid, Age, Biddersa.
Dependent Variable: Priceb. Coefficientsa
320.458 295.141 1.086 .287
.878 2.032 .061 .432 .669
-93.265 29.892 -.673 -3.120 .004
1.298 .212 1.369 6.112 .000
(Constant)
Age
Bidders
AgeBid
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Pricea.
Interpreting interaction models
The coefficient for the interaction term is significant. If an interaction term is present then also the
corresponding first order terms need to be included to correctly interpret the model.
In the example an uncareful analyst could estimate the effect of Bidders as negative, since b2=-93.26
Since an interaction term is present, the slope estimate for Bidders (x2) is
b2 + b3x1
For x1= 150 (age) the estimated slope for Bidders is
-93.26 + 1.3 (150) = 101.74
Note: b = β
^
Models with qualitative X’s
Regression models can also include qualitative (or categorical) independent variables (QIV).
The categories of a QIV are called levels
Since the levels of a QIV are not measured on a natural numerical scale in order to avoid introducing fictitious linear relations in the model we need to use a specific type of coding.
Coding is done by using IV which assume only two values: 0 or 1.
These coded IV are called dummy variables
Models with QIV
• Suppose we want to model Income (Y) as a function of Sex (x) -> use coded, or dummy, variables
x = 1 if Male, x = 0 if Female
E(Y) = β0+ β1xE(Y) = β0+ β1 if x =1, i.e. Male
E(Y) = β0 if x =0, i.e. Female
β0 is the base level, i.e Female is the reference category
β1 is the additional effect if MaleIn this simple model, only the means for the two
groups are modeled
QIV with q levels
As a general rule, if a QIV has q levels we need q-1 dummies for coding. The uncoded level is the reference one.
Example: a QIV has three levels, A, B and C
Define x1 = 1 level A, x1 = 0 if not
x2 = 1 level B, x2 = 0 if not
C is the reference level
Model: E(Y) = β0+ β1x1 + β2x2Interpreting β’s
β0 = μC (mean for base level C)
β1 = μA - μC (additional effect wrt C if level A)
β2 = μB - μC (additional effect wrt C if level B)
Models with dummies
Dummies can be used in combination with any other dummies and quantitative X’s to construct models with first order effects (or main effects) and interactions to test hypotheses of interest.
Even if models which consider only dummy variables do in practice estimate the means of various groups, the testing machinery of the regression setup can be useful for group comparisons.
In order to define dummies in SPSS see “Computing dummy vars in SPSS.ppt”
Example: executive salaries
A managing consulting firms has developed a regression model in order to analyze executive’s salary structure
• Y = Annual salary (in dollars)
• x1 = Years of experience
• x2 = Years of education
• x3 = Gender : 1 if male; 0 if female
• x4 = Number of employees supervised
• x5 = Corporate assets (in millions of dollars)
Data: ExecSal.sav
A simple model: E(Y) = β0 + β3x3
This model estimates the means of the two groups (M,F)
We wanto to test if the difference in means is significant, i.e. not due to chance
Male group
Female group
Regression Output
Model Summary
.392a .153 .145 23320.282Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Gendera. ANOVAb
9651865066.845 1 9651865066.845 17.748 .000a
53295882433.156 98 543835535.032
62947747500.001 99
Regression
Residual
Total
Model1
Sum of Squares df Mean Square F Sig.
Predictors: (Constant), Gendera.
Dependent Variable: Annual salary in $b. Coefficientsa
83847.059 3999.395 20.965 .000 75910.389 91783.729
20739.305 4922.915 .392 4.213 .000 10969.940 30508.670
(Constant)
Gender
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Annual salary in $a.
Salary difference between groups is
significant
Mean increment for Male
C.I. for mean increment
Model 2: E(Y) = β0 + β1x1 + β3x3
Model 2 considers same
slope but different
intercepts
It seems that the two
groups are separated
If x3 = 0 (female) then E(Y) = β0 + β1x1
If x3 = 1 (male) then E(Y) = β0 + β3 + β1x1
Computer output for model 2
Model Summary
.860a .740 .735 12981.615Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Years of Experience, Gendera. ANOVAb
46601081714.527 2 23300540857.264 138.264 .000a
16346665785.474 97 168522327.685
62947747500.001 99
Regression
Residual
Total
Model1
Sum of Squares df Mean Square F Sig.
Predictors: (Constant), Years of Experience, Gendera.
Dependent Variable: Annual salary in $b. Coefficientsa
50614.312 3161.279 16.011 .000 44340.048 56888.576
18894.215 2743.253 .357 6.888 .000 13449.618 24338.812
2633.831 177.875 .767 14.807 .000 2280.799 2986.863
(Constant)
Gender
Years of Experience
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Annual salary in $a.
R square improved greatly
New intercept for Male is significant
In this model effect of experience is assumed equal
for the two groups
Model 3: E(Y) = β0 + β1x1 + β3x3 + β4x1x3
With this model we want to test whether gender and experience interacts, i.e. if male salary tend to
grow at a faster (slower) rate with experience.
If x3 = 0 (female) then E(Y) = β0 + β1x1
If x3 = 1 (male) then E(Y) = (β0 + β3) + (β1 + β4)x1New intercept for
male New slope for male
Remark: running regression for the two groups together allows to have higher degrees of freedom (n) for estimating parameters and model variance.
Model 3: E(Y) = β0 + β1x1 + β3x3 + β4x1x3
Model 3 considers
different slope and different
intercepts
Computer output for model 3
Model Summary
.868a .754 .746 12700.080Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), ExpGender, Years ofExperience, Gender
a. Coefficientsa
58049.768 4461.179 13.012 .000 49194.397 66905.139
7798.504 5497.470 .147 1.419 .159 -3113.888 18710.896
2044.541 308.565 .595 6.626 .000 1432.045 2657.036
864.122 373.653 .301 2.313 .023 122.426 1605.818
(Constant)
Gender
Years of Experience
ExpGender
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Annual salary in $a.
There is evidence that salaries for the two
groups grow at different rate with experience
Estimated lines:
Y = 58049.8 + 2044.5*(Years of Experience) for female
Y = 65848.3 + 2908.7*(Years of Experience) for male
^
^
A complete second order model
E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2+ β4x12+
β5 x22
• Interpretation of model parameters:
• β0: y-intercept. The value of E(Y) when x1 = x2 = 0
• β1 and β2 : shifts along the x1 and x2 axes;
• β3 : rotation of the surface;
• β4 and β5 : controls the rate of curvature.
Back to Executive salaries
What about if suspect that rate of growth changes and has opposite signs for M and F?
E(Y)=β0+ β1x1+ β2 x3 + β3 x1x3+ β4x12
x1 = Years of experience
x3 = Gender (1 if Male)
E(Y)=β0+ β1x1+ β2 x3 + β3 x1x3+ β4x12+
β5 x3x12
Model 4
Model 5
Note: x32 = x3
since it is a dummy
Comparing Model 4 and 5
If x3 = 0 (female) then
E(Y) = β0 + β1x1 + β4x12
If x3 = 1 (male) then
E(Y) = (β0 + β2) + (β1 + β3)x1 + β4x12
Model 4
Different intercept and slope for M and F but same curvatureModel 5
If x3 = 0 (female) then
E(Y) = β0 + β1x1 + β4x12
If x3 = 1 (male) then
E(Y) = (β0 + β2) + (β1 + β3)x1 + (β4+β5)x12
Different intercept, slope and curvature for M and F
Model 5: computer output
Riepilogo del modello
Modello
R R-quadratoR-quadrato
corretto
Deviazione standard Errore
della stima
dimension0
1 ,875a ,766 ,754 12507,735a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen
Anovab
Modello Somma dei
quadrati dfMedia dei quadrati F Sig.
1Regressione 4,824E10 5 9,648E9 61,673 ,000a
Residuo 1,471E10 94 1,564E8
Totale 6,295E10 99a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen
b. Variabile dipendente: Annual salary in $
Model 5: computer output
Coefficientia
Modello Coefficienti non standardizzati
t Sig.B
Deviazione
standard Errore Beta
1(Costante) 52391,973 6497,971 8,063 ,000
Years of Experience
3373,970 1165,248 ,982 2,895 ,005
Gender 21122,152 8285,802 ,399 2,549 ,012
ExpGen -2081,897 1459,842 -,724 -1,426 ,157
ExpSqu -53,181 45,001 -,422 -1,182 ,240
Exp2Gen 112,836 54,950 ,904 2,053 ,043
a. Variabile dipendente: Annual salary in $
Which model is preferable? Model 3 or model 5?
A test for comparing nested models
Two models are nested if one model contains all the terms of the other model and at least one additional term.
The more complex of the two models is called the complete (or full) model.
The other is called the reduced (or restricted) model.
Example: model 1 is nested in model 2Model 1: E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2
Model 2: E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2+ β4x12+ β5 x2
2
To compare the two models we are interested in testing
H0: β4 = β5 = 0, vs. H1: at least one, β4 or β5, differs from 0
F-test for comparing nested models
Reduced model:E(Y) = β0+ β1x1+ … + β2 xg
Complete Model:E(Y) = β0+ β1x1+ … + β2 xg + βg+1 xg+1 + … + βkxk
To test H0: βg+1 = … = βk = 0
H1: at least one of the parameters being tested is not 0
Reject H0 when F > Fα, where Fα is the level α critical point of an F distribution with (k-g, n-(k+1)) d.f.
C
CR
MSE
gkSSESSEF
)/()( Compute
F-test for nested models
Where:SSER = Sum of squared errors for the reduced model;
SSEC = Sum of squared errors for the complete model;
MSEC = Mean square error for the complete model;
Remark:k – g = number of parameters testedk +1 = number of parameters in the complete
modeln = total sample size
Compute partial F-tests with SPSS
1. Enter your complete model in the Regression dialog box– choose the Method “Enter”
2. Click on “Next”3. In the new box for Independent variables, enter
those you want to remove (i.e. those you’d like to test)– choose the Method “Remove”
4. In the “Statistics” option select “R squared change”5. Ok.
Applying the F-test
Model 3:
E(Y) = β0 + β1x1 + β2x3 + β3x1x3
Let us use the F-test to compare Model 3 and Model 5 in the executive salaries example.
Model 5:
E(Y) = β0 + β1x1 + β2x3 + β3x1x3 + β4x12 +
β5x3x12
Note that Model 3 is nested in Model 5
Apply the F-test for H0: β4 = β5 = 0
Computer output
Variabili inserite/rimossec
Modello Variabili inserite
Variabili rimosse Metodo
1 Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGena
. Per blocchi
2 .a Exp2Gen, ExpSqub
Rimuovi
a. Tutte le variabili richieste sono state immesse.
b. Tutte le variabili richieste sono state rimosse.
c. Variabile dipendente: Annual salary in $
Riepilogo del modello
Model
R
R-quadr
ato
R-quadrat
o corretto
Deviazione standard
Errore della stima
Variazione dell'adattamento
Variazione di R-
quadrato
Variazione di
F df1 df2
Sig. Variazione di
F
1 ,875° ,766 ,754 12507,735 ,766 61,673 5 94 ,000
2 ,868b ,754 ,746 12700,080 -,012 2,488 2 94 ,089a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen
b. Predittori: (Costante), Gender, Years of Experience, ExpGen
F-statistic F p-value
Do NOT reject H0: β4 = β5 = 0, i.e. Model 3 is better
A quadratic model example: Shipping costs
– Y : cost of shipment in dollars
– X1: package weight in pounds
– X2: distance shipped in miles
Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12
+ β5x22
Data: Express.sav
Although a regional delivery service bases the charge for shipping a package on the package weight and distance shipped,
its profit per package depends on the package size (volume of space it occupies) and the size and nature of the delivery truck.
The company conducted a study to investigate the relationship between the cost of shipment and the variables
that control the shipping charge: weight and distance.
It is suspected that non linear effect may be present
Scatter plots
0.00 2.00 4.00 6.00 8.00
Weight of parcel in lbs.
4.0
8.0
12.0
16.0
Co
st o
f sh
ipm
ent
50 100 150 200 250
Distance shipped
4.0
8.0
12.0
16.0
Co
st o
f sh
ipm
ent
Scatter plots in multiple regression often do not show too much information
Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 +
β5x22
Model Summary
.997a .994 .992 .4428Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Weight*Distance, Distancesquared, Weight squared, Weight of parcel in lbs.,Distance shipped
a. ANOVAb
449.341 5 89.868 458.388 .000a
2.745 14 .196
452.086 19
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Weight*Distance, Distance squared, Weight squared,Weight of parcel in lbs., Distance shipped
a.
Dependent Variable: Cost of shipmentb.
Coefficientsa
.827 .702 1.178 .259
-.609 .180 -.316 -3.386 .004
.004 .008 .062 .503 .623
.090 .020 .382 4.442 .001
1.51E-005 .000 .075 .672 .513
.007 .001 .850 11.495 .000
(Constant)
Weight of parcel in lbs.
Distance shipped
Weight squared
Distance squared
Weight*Distance
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Cost of shipmenta.
Not significant, try to eliminate Distance squared
Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12
Model Summary
.997a .994 .992 .4346Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Weight*Distance, Distanceshipped, Weight squared, Weight of parcel in lbs.
a. ANOVAb
449.252 4 112.313 594.623 .000a
2.833 15 .189
452.086 19
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Weight*Distance, Distance shipped, Weight squared,Weight of parcel in lbs.
a.
Dependent Variable: Cost of shipmentb.
Coefficientsa
.475 .458 1.035 .317
-.578 .171 -.300 -3.387 .004
.009 .003 .141 3.421 .004
.087 .019 .369 4.485 .000
.007 .001 .842 11.753 .000
(Constant)
Weight of parcel in lbs.
Distance shipped
Weight squared
Weight*Distance
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Cost of shipmenta.
Applying the F-test: Shipping costs
– Y : cost of shipment in dollars
– X1: package weight in pounds
– X2: distance shipped in miles
Model 1: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 +
β5x22
Data: Express.sav
A company conducted a study to investigate the relationship between the cost of shipment and the variables that control
the shipping charge: weight and distance.
It is suspected that non linear effect may be present, use the F-test for nested models to
decide between
Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2
ANOVA Tables
ANOVAb
449.341 5 89.868 458.388 .000a
2.745 14 .196
452.086 19
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Weight*Distance, Distance squared, Weight squared,Weight of parcel in lbs., Distance shipped
a.
Dependent Variable: Cost of shipmentb.
Full model
Reduced model ANOVAb
445.452 3 148.484 358.154 .000a
6.633 16 .415
452.086 19
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Distance shipped, Weight of parcel in lbs., Weight*Distancea.
Dependent Variable: Cost of shipmentb.
F-statistic
To test H0: β4 = β5 = 0, from the ANOVA tables we have
92.9196.0
2/)745.2633.6(2/)(
C
CR
MSE
SSESSEF
The critical value Fα (at 5% level) for and F-distribution with 2 and 14 d.f. is 3.74
Since F (9.92) > Fα (3.74) the null hypothesis is rejected at the 5% significance level. I.e. the model with quadratic terms is preferred over the reduced one.
Computer outputVariables Entered/Removedc
Weight*Distance,Distancesquared,Weightsquared,Weight ofparcel inlbs.,Distanceshipped
a
. Enter
.a
Distancesquared,Weightsquared
bRemove
Model1
2
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
All requested variables removed.b.
Dependent Variable: Cost of shipmentc. Model Summary
.997a .994 .992 .4428 .994 458.388 5 14 .000
.993b .985 .983 .6439 -.009 9.917 2 14 .002
Model1
2
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors: (Constant), Weight*Distance, Distance squared, Weight squared, Weight of parcel in lbs., Distance shippeda.
Predictors: (Constant), Weight*Distance, Weight of parcel in lbs., Distance shippedb.
F-statistic
F p-value
Reject H0: β4 = β5 = 0
Executive salaries: a final model (?)
• Y = Annual salary (in dollars)
• x1 = Years of experience
• x2 = Years of education
• x3 = Gender : 1 if male; 0 if female
• x4 = Number of employees supervised
• x5 = Corporate assets (in millions of dollars)
Try adding other variables to model 3
E(Y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x3 + β5x4 + β6x5
Model 6
Computer Output: Model 6Riepilogo del modello
Modello
R R-quadratoR-quadrato
correttoErrore della
stima
1 ,963a ,927 ,922 7020,089a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of
Employees supervised, ExpGender
Anovab
Model Somma dei
quadrati dfMedia dei quadrati F Sig.
1Regressione 5,836E10 6 9,727E9 197,384 ,000a
Residuo 4,583E9 93 4,928E7
Totale 6,295E10 99a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender
Computer Output: Model 6
CoefficientsModel
Coefficienti non standardizzati
Coefficienti
standardizzati
t Sig.B
Deviazione standard
Errore Beta1
(Costante) -38331,331 9533,238 -4,021 ,000
Years of Experience 2178,964 171,979,634
12,670 ,000
Gender 13203,101 3137,775,249
4,208 ,000
ExpGender 669,546 209,042,233
3,203 ,002
Years of Education 2689,594 311,914,246
8,623 ,000
Number of Employees supervised
53,239 4,470,353
11,910 ,000
Corporate assets (in million $)
180,310 46,600,110
3,869 ,000
a. Variabile dipendente: Annual salary in $
Executive salaries: comparison of models
Mod.
Predictors Adj. R2 Standard
error
F-stat
1 x1, x2, x4, x5
0.747 12685.31
74.05
2 x1, x3 0.735 12981.62
138.26
3 x1, x3, x1∙x3
0.746 12700.08
98.09
6 x1, x3, x1∙x3, x4, x5
0.922 7020.09 197.38