2 - multiple regression models

65
Types of regression models Regression Models Simple Multiple order order order order Interact ion Higher order Higher order

Upload: mike-parisi

Post on 14-Dec-2015

31 views

Category:

Documents


3 download

DESCRIPTION

Multiple Regression models.ppt

TRANSCRIPT

Page 1: 2 - Multiple Regression Models

Types of regression models

Regression Models

Simple Multiple

2° order

1° order

2° order

1° order

Interaction

Higher order Higher order

Page 2: 2 - Multiple Regression Models

A quadratic second order model

E(Y)=β0+ β1x+ β2 x2

• Interpretation of model parameters:

• β0: y-intercept. The value of E(Y) when x1 = x2 = 0

• β1 : is the shift parameter;

• β2 : is the rate of curvature;

Page 3: 2 - Multiple Regression Models

Example with quadratic terms

2.00 4.00 6.00 8.00 10.00

x

0.00

25.00

50.00

75.00

100.00

The true model, supposedly unknown, is

Yi = 2 + xi2 + εi, with εi~N(0,2)

Data: (x,y). See SQM.sav

Page 4: 2 - Multiple Regression Models

Model 1: E(Y) = β0 + β1x

Model Summary

,973a ,947 ,947 6,60994Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), xa. ANOVAb

80624,915 1 80624,915 1845,332 ,000a

4500,202 103 43,691

85125,117 104

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), xa.

Dependent Variable: yb. Coefficientsa

-19,959 1,483 -13,454 ,000

10,744 ,250 ,973 42,957 ,000

(Constant)

x

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: ya.

Page 5: 2 - Multiple Regression Models

Linear Regression

2.00 4.00 6.00 8.00 10.00

x

0.00

25.00

50.00

75.00

100.00

y = -19.96 + 10.74 * xR-Square = 0.95

Page 6: 2 - Multiple Regression Models

Model 2: E(Y) = β0 + β1x2

Model Summary

,996a ,991 ,991 2,68707Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), XSquarea. ANOVAb

84381,422 1 84381,422 11686,632 ,000a

743,695 103 7,220

85125,117 104

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), XSquarea.

Dependent Variable: yb.

Coefficientsa

2,340 ,417 5,608 ,000

,997 ,009 ,996 108,105 ,000

(Constant)

XSquare

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: ya.

Smaller variance and SE

Page 7: 2 - Multiple Regression Models

Linear Regression

0.00 25.00 50.00 75.00 100.00

XSquare

0.00

25.00

50.00

75.00

100.00

y = 2.34 + 1.00 * XSquareR-Square = 0.99

Page 8: 2 - Multiple Regression Models

Model 3: E(Y) = β0 + β1x + β2x2

Model Summary

.996a .991 .991 2.66608Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), XSquare, xa. ANOVAb

84400.103 2 42200.052 5936.999 .000a

725.014 102 7.108

85125.117 104

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), XSquare, xa.

Dependent Variable: yb. Coefficientsa

4.177 1.206 3.463 .001

-.830 .512 -.075 -1.621 .108

1.071 .046 1.069 23.046 .000

(Constant)

x

XSquare

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: ya.

Page 9: 2 - Multiple Regression Models

Types of regression models

Regression Models

Simple Multiple

2° order

1° order

2° order

1° order

Interaction

Higher order Higher order

Page 10: 2 - Multiple Regression Models

Y

X1

Y

X1

Y

X1

Y

X1

3 < 0

3 > 0

A third order model with 1 IV

E(Y)=β0+ β1x+ β2 x2+ β3 x3

Use with caution given numerical problems that

could arise

Page 11: 2 - Multiple Regression Models

Types of regression models

Regression Models

Simple Multiple

2° order

1° order

2° order

1° order

Interaction

Higher order Higher order

Page 12: 2 - Multiple Regression Models

First-Order model in k Quantitative variables

E(Y)=β0+β1x1+β2 x2 + ... + βk xk

Interpretation of model parameters:

β0: y-intercept. The value of E(Y) when x1 = x2 =...= xk= 0

β1: change in E(Y) for a 1-unit increase in x1 when x2,.., xk are held fixed;

β2: change in E(Y) for a 1-unit increase in x2 when x1, x3,..., xk are held fixed;

...

Page 13: 2 - Multiple Regression Models

A bivariate model

Changing x2 changes only the y-intercept.

E(Y)=β0+β1x1+β2 x2

In the first order model a 1-unit change in one independent variable will have the same effect on the mean value of y regardless of the other independent variables.

Page 14: 2 - Multiple Regression Models

X2

Y

X1E (Y ) = 0 + 1X 1i + 2X 2i

0

Y i = 0 + 1X 1i + 2X 2i + i

Resp on seP la ne

(X 1i,X 2i)

(Observed Y )

iX2

Y

X1E (Y ) = 0 + 1X 1i + 2X 2i

0

Y i = 0 + 1X 1i + 2X 2i + i

Resp on seP la ne

(X 1i,X 2i)

(Observed Y )

i

A bivariate model

Page 15: 2 - Multiple Regression Models

Example: executive salaries

• Y = Annual salary (in dollars)• x1 = Years of experience• x2 = Years of education• x3 = Gender : 1 if male; 0 if female• x4 = Number of employees supervised• x5 = Corporate assets (in millions of dollars)

Data: ExecSal.sav

E(Y)=β0+ β1x1+ β2 x2 + β4 x4 + β5 x5

Do not consider x3

(Gender) for the moment

Page 16: 2 - Multiple Regression Models

Exsecutive salaries: Computer Output

Riepilogo del modello

Modello

R R-quadratoR-quadrato

correttoDeviazione standard Errore

della stima,870a ,757 ,747 12685,309

a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Number of Employees supervised

Riepilogo del modelloModello

R R-quadratoR-quadrato

corretto

Deviazione standard Errore

della stima

dimension0

1

,783a ,613 ,609 15760,006a. Predittori: (Costante), Years of Experience

Simple regressionMultiple regression

Page 17: 2 - Multiple Regression Models

SST

SSE

SST

SSRR 1

variationTotal

variationExplained2

Coefficient of determination

2

1

2

1

2

1

)ˆ()ˆ()( i

n

ii

n

ii

n

ii yyyyyy

The coefficient R2 is computed exactly as in the simple regression case.

SSE (Error)SSR (Regression)

SST (Total)

A drawback of R2: it increases with the number of added variables, even if these are NOT relevant to

the problem.

Page 18: 2 - Multiple Regression Models

A solution: Adjusted R2

– Each additional variable reduces adjusted R2, unless SSE varies enough to compensate

22 1

1

11 R

SST

SSE

SST

SSE

kn

nRa

Adjusted R2 and estimate of the variance σ2

1

22

knknSSE

s i

An unbiased estimator of the variance σ 2 is computed as

Page 19: 2 - Multiple Regression Models

Coefficientia

Model

Coefficienti non standardizzati

Coefficienti standardizz

ati

t Sig.B

Deviazione standard

Errore Beta1

(Costante) -37082,148 17052,089 -2,175 ,032

Years of Experience

2696,360 173,647 ,785 15,528 ,000

Years of Education

2656,017 563,476 ,243 4,714 ,000

Number of Employees supervised

41,092 7,807 ,272 5,264 ,000

Corporate assets (in million $)

244,569 83,420 ,149 2,932 ,004

Variabile dipendente: Annual salary in $

Exsecutive salaries: Computer Output (2)

Variables

T-tests

Page 20: 2 - Multiple Regression Models

• 1. Shows If There Is a Linear Relationship Between All X Variables Together & Y

• 2. Uses F Test Statistic

• 3. Hypotheses

– H0: 1 = 2 = ... = k = 0

•No Linear Relationship

– Ha: At Least One Coefficient Is Not 0

•At Least One X Variable Affects Y

The F-test for 1 single coefficient is equivalent to the t-test

Testing overall significance: the F-test

Page 21: 2 - Multiple Regression Models

Anova table

Anovab

ModelloSomma dei

quadrati dfMedia dei quadrati F Sig.

1Regressione 4,766E10 4 1,192E10 74,045 ,000a

Residuo 1,529E10 95 1,609E8

Totale 6,295E10 99a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Number of Employees supervised

b. Variabile dipendente: Annual salary in $

F-statistic

MSE (mean square error),

the estimate of variance

df = k: number of regression slopes df = n-1: n=

number of observations

p-vale of F-test

Decision: reject H0, i.e. accept

this model

Page 22: 2 - Multiple Regression Models

Interaction (second order) model

E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2

• Interpretation of model parameters:

• β0: y-intercept. The value of E(Y) when x1 = x2 = 0

• β1+ β3 x2 : change in E(Y) for a 1-unit increase in x1 when x2 is held fixed;

• β2 + β3 x1 : change in E(Y) for a 1-unit increase in x2 when x1 is held fixed;

• β3: controls the rate of change of the surface.

Page 23: 2 - Multiple Regression Models

Interaction (second order) model

Contour lines are not parallel

E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2

The effect of one variable depends on the level of the other

Page 24: 2 - Multiple Regression Models

Example: Antique grandfather clocks auction

Clocks are sold at an auction on competitive offers. Data are:– Y : auction price in dollars

– X1: age of clocks

– X2: number of bidders

Model 1: E(Y) = β0 + β1x1 + β2x2

Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2

Data: GFCLOCKS.sav

Page 25: 2 - Multiple Regression Models

Data summaries

Descriptive Statistics

32 108 194 144.94 27.395 .216 .414 -1.323 .809

32 5 15 9.53 2.840 .420 .414 -.788 .809

32 729 2131 1326.88 393.487 .396 .414 -.727 .809

32

Age

Bidders

Price

Valid N (listwise)

Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error

N Minimum

Maximum

Mean Std.Deviatio

n

Skewness Kurtosis

If data are Normal Skewness is 0

If data are Normal (eccess) Kurtosis is 0Note: Skewness and Kurtosis are not enough to establish Normality

Page 26: 2 - Multiple Regression Models

P-P plot for Normality

If data are Normal. Points should be along the straight line.

In this example the situation is fairly good

Page 27: 2 - Multiple Regression Models

Bivariate scatter-plots

120 140 160 180

Age

800

1200

1600

2000

6 8 10 12 14

Bidders

800

1200

1600

2000

Page 28: 2 - Multiple Regression Models

Model 1: E(Y) = β0 + β1x1 + β2x2

Model Summary

.945a .892 .885 133.485Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Bidders, Agea. ANOVAb

4283062.960 2 2141531.480 120.188 .000a

516726.540 29 17818.157

4799789.500 31

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Bidders, Agea.

Dependent Variable: Priceb. Coefficientsa

-1338.951 173.809 -7.704 .000

12.741 .905 .887 14.082 .000

85.953 8.729 .620 9.847 .000

(Constant)

Age

Bidders

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Pricea.

Page 29: 2 - Multiple Regression Models

Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2

Model Summary

.977a .954 .949 88.915Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), AgeBid, Age, Biddersa. ANOVAb

4578427.367 3 1526142.456 193.041 .000a

221362.133 28 7905.790

4799789.500 31

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), AgeBid, Age, Biddersa.

Dependent Variable: Priceb. Coefficientsa

320.458 295.141 1.086 .287

.878 2.032 .061 .432 .669

-93.265 29.892 -.673 -3.120 .004

1.298 .212 1.369 6.112 .000

(Constant)

Age

Bidders

AgeBid

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Pricea.

Page 30: 2 - Multiple Regression Models

Interpreting interaction models

The coefficient for the interaction term is significant. If an interaction term is present then also the

corresponding first order terms need to be included to correctly interpret the model.

In the example an uncareful analyst could estimate the effect of Bidders as negative, since b2=-93.26

Since an interaction term is present, the slope estimate for Bidders (x2) is

b2 + b3x1

For x1= 150 (age) the estimated slope for Bidders is

-93.26 + 1.3 (150) = 101.74

Note: b = β

^

Page 31: 2 - Multiple Regression Models

Models with qualitative X’s

Regression models can also include qualitative (or categorical) independent variables (QIV).

The categories of a QIV are called levels

Since the levels of a QIV are not measured on a natural numerical scale in order to avoid introducing fictitious linear relations in the model we need to use a specific type of coding.

Coding is done by using IV which assume only two values: 0 or 1.

These coded IV are called dummy variables

Page 32: 2 - Multiple Regression Models

Models with QIV

• Suppose we want to model Income (Y) as a function of Sex (x) -> use coded, or dummy, variables

x = 1 if Male, x = 0 if Female

E(Y) = β0+ β1xE(Y) = β0+ β1 if x =1, i.e. Male

E(Y) = β0 if x =0, i.e. Female

β0 is the base level, i.e Female is the reference category

β1 is the additional effect if MaleIn this simple model, only the means for the two

groups are modeled

Page 33: 2 - Multiple Regression Models

QIV with q levels

As a general rule, if a QIV has q levels we need q-1 dummies for coding. The uncoded level is the reference one.

Example: a QIV has three levels, A, B and C

Define x1 = 1 level A, x1 = 0 if not

x2 = 1 level B, x2 = 0 if not

C is the reference level

Model: E(Y) = β0+ β1x1 + β2x2Interpreting β’s

β0 = μC (mean for base level C)

β1 = μA - μC (additional effect wrt C if level A)

β2 = μB - μC (additional effect wrt C if level B)

Page 34: 2 - Multiple Regression Models

Models with dummies

Dummies can be used in combination with any other dummies and quantitative X’s to construct models with first order effects (or main effects) and interactions to test hypotheses of interest.

Even if models which consider only dummy variables do in practice estimate the means of various groups, the testing machinery of the regression setup can be useful for group comparisons.

In order to define dummies in SPSS see “Computing dummy vars in SPSS.ppt”

Page 35: 2 - Multiple Regression Models

Example: executive salaries

A managing consulting firms has developed a regression model in order to analyze executive’s salary structure

• Y = Annual salary (in dollars)

• x1 = Years of experience

• x2 = Years of education

• x3 = Gender : 1 if male; 0 if female

• x4 = Number of employees supervised

• x5 = Corporate assets (in millions of dollars)

Data: ExecSal.sav

Page 36: 2 - Multiple Regression Models

A simple model: E(Y) = β0 + β3x3

This model estimates the means of the two groups (M,F)

We wanto to test if the difference in means is significant, i.e. not due to chance

Male group

Female group

Page 37: 2 - Multiple Regression Models

Regression Output

Model Summary

.392a .153 .145 23320.282Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Gendera. ANOVAb

9651865066.845 1 9651865066.845 17.748 .000a

53295882433.156 98 543835535.032

62947747500.001 99

Regression

Residual

Total

Model1

Sum of Squares df Mean Square F Sig.

Predictors: (Constant), Gendera.

Dependent Variable: Annual salary in $b. Coefficientsa

83847.059 3999.395 20.965 .000 75910.389 91783.729

20739.305 4922.915 .392 4.213 .000 10969.940 30508.670

(Constant)

Gender

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: Annual salary in $a.

Salary difference between groups is

significant

Mean increment for Male

C.I. for mean increment

Page 38: 2 - Multiple Regression Models

Model 2: E(Y) = β0 + β1x1 + β3x3

Model 2 considers same

slope but different

intercepts

It seems that the two

groups are separated

If x3 = 0 (female) then E(Y) = β0 + β1x1

If x3 = 1 (male) then E(Y) = β0 + β3 + β1x1

Page 39: 2 - Multiple Regression Models

Computer output for model 2

Model Summary

.860a .740 .735 12981.615Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Years of Experience, Gendera. ANOVAb

46601081714.527 2 23300540857.264 138.264 .000a

16346665785.474 97 168522327.685

62947747500.001 99

Regression

Residual

Total

Model1

Sum of Squares df Mean Square F Sig.

Predictors: (Constant), Years of Experience, Gendera.

Dependent Variable: Annual salary in $b. Coefficientsa

50614.312 3161.279 16.011 .000 44340.048 56888.576

18894.215 2743.253 .357 6.888 .000 13449.618 24338.812

2633.831 177.875 .767 14.807 .000 2280.799 2986.863

(Constant)

Gender

Years of Experience

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: Annual salary in $a.

R square improved greatly

New intercept for Male is significant

In this model effect of experience is assumed equal

for the two groups

Page 40: 2 - Multiple Regression Models

Model 3: E(Y) = β0 + β1x1 + β3x3 + β4x1x3

With this model we want to test whether gender and experience interacts, i.e. if male salary tend to

grow at a faster (slower) rate with experience.

If x3 = 0 (female) then E(Y) = β0 + β1x1

If x3 = 1 (male) then E(Y) = (β0 + β3) + (β1 + β4)x1New intercept for

male New slope for male

Remark: running regression for the two groups together allows to have higher degrees of freedom (n) for estimating parameters and model variance.

Page 41: 2 - Multiple Regression Models

Model 3: E(Y) = β0 + β1x1 + β3x3 + β4x1x3

Model 3 considers

different slope and different

intercepts

Page 42: 2 - Multiple Regression Models

Computer output for model 3

Model Summary

.868a .754 .746 12700.080Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), ExpGender, Years ofExperience, Gender

a. Coefficientsa

58049.768 4461.179 13.012 .000 49194.397 66905.139

7798.504 5497.470 .147 1.419 .159 -3113.888 18710.896

2044.541 308.565 .595 6.626 .000 1432.045 2657.036

864.122 373.653 .301 2.313 .023 122.426 1605.818

(Constant)

Gender

Years of Experience

ExpGender

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: Annual salary in $a.

There is evidence that salaries for the two

groups grow at different rate with experience

Estimated lines:

Y = 58049.8 + 2044.5*(Years of Experience) for female

Y = 65848.3 + 2908.7*(Years of Experience) for male

^

^

Page 43: 2 - Multiple Regression Models

A complete second order model

E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2+ β4x12+

β5 x22

• Interpretation of model parameters:

• β0: y-intercept. The value of E(Y) when x1 = x2 = 0

• β1 and β2 : shifts along the x1 and x2 axes;

• β3 : rotation of the surface;

• β4 and β5 : controls the rate of curvature.

Page 44: 2 - Multiple Regression Models

Back to Executive salaries

What about if suspect that rate of growth changes and has opposite signs for M and F?

E(Y)=β0+ β1x1+ β2 x3 + β3 x1x3+ β4x12

x1 = Years of experience

x3 = Gender (1 if Male)

E(Y)=β0+ β1x1+ β2 x3 + β3 x1x3+ β4x12+

β5 x3x12

Model 4

Model 5

Note: x32 = x3

since it is a dummy

Page 45: 2 - Multiple Regression Models

Comparing Model 4 and 5

If x3 = 0 (female) then

E(Y) = β0 + β1x1 + β4x12

If x3 = 1 (male) then

E(Y) = (β0 + β2) + (β1 + β3)x1 + β4x12

Model 4

Different intercept and slope for M and F but same curvatureModel 5

If x3 = 0 (female) then

E(Y) = β0 + β1x1 + β4x12

If x3 = 1 (male) then

E(Y) = (β0 + β2) + (β1 + β3)x1 + (β4+β5)x12

Different intercept, slope and curvature for M and F

Page 46: 2 - Multiple Regression Models

Model 5: computer output

Riepilogo del modello

Modello

R R-quadratoR-quadrato

corretto

Deviazione standard Errore

della stima

dimension0

1 ,875a ,766 ,754 12507,735a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen

Anovab

Modello Somma dei

quadrati dfMedia dei quadrati F Sig.

1Regressione 4,824E10 5 9,648E9 61,673 ,000a

Residuo 1,471E10 94 1,564E8

Totale 6,295E10 99a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen

b. Variabile dipendente: Annual salary in $

Page 47: 2 - Multiple Regression Models

Model 5: computer output

Coefficientia

Modello Coefficienti non standardizzati

t Sig.B

Deviazione

standard Errore Beta

1(Costante) 52391,973 6497,971 8,063 ,000

Years of Experience

3373,970 1165,248 ,982 2,895 ,005

Gender 21122,152 8285,802 ,399 2,549 ,012

ExpGen -2081,897 1459,842 -,724 -1,426 ,157

ExpSqu -53,181 45,001 -,422 -1,182 ,240

Exp2Gen 112,836 54,950 ,904 2,053 ,043

a. Variabile dipendente: Annual salary in $

Which model is preferable? Model 3 or model 5?

Page 48: 2 - Multiple Regression Models

A test for comparing nested models

Two models are nested if one model contains all the terms of the other model and at least one additional term.

The more complex of the two models is called the complete (or full) model.

The other is called the reduced (or restricted) model.

Example: model 1 is nested in model 2Model 1: E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2

Model 2: E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2+ β4x12+ β5 x2

2

To compare the two models we are interested in testing

H0: β4 = β5 = 0, vs. H1: at least one, β4 or β5, differs from 0

Page 49: 2 - Multiple Regression Models

F-test for comparing nested models

Reduced model:E(Y) = β0+ β1x1+ … + β2 xg

Complete Model:E(Y) = β0+ β1x1+ … + β2 xg + βg+1 xg+1 + … + βkxk

To test H0: βg+1 = … = βk = 0

H1: at least one of the parameters being tested is not 0

Reject H0 when F > Fα, where Fα is the level α critical point of an F distribution with (k-g, n-(k+1)) d.f.

C

CR

MSE

gkSSESSEF

)/()( Compute

Page 50: 2 - Multiple Regression Models

F-test for nested models

Where:SSER = Sum of squared errors for the reduced model;

SSEC = Sum of squared errors for the complete model;

MSEC = Mean square error for the complete model;

Remark:k – g = number of parameters testedk +1 = number of parameters in the complete

modeln = total sample size

Page 51: 2 - Multiple Regression Models

Compute partial F-tests with SPSS

1. Enter your complete model in the Regression dialog box– choose the Method “Enter”

2. Click on “Next”3. In the new box for Independent variables, enter

those you want to remove (i.e. those you’d like to test)– choose the Method “Remove”

4. In the “Statistics” option select “R squared change”5. Ok.

Page 52: 2 - Multiple Regression Models

Applying the F-test

Model 3:

E(Y) = β0 + β1x1 + β2x3 + β3x1x3

Let us use the F-test to compare Model 3 and Model 5 in the executive salaries example.

Model 5:

E(Y) = β0 + β1x1 + β2x3 + β3x1x3 + β4x12 +

β5x3x12

Note that Model 3 is nested in Model 5

Apply the F-test for H0: β4 = β5 = 0

Page 53: 2 - Multiple Regression Models

Computer output

Variabili inserite/rimossec

Modello Variabili inserite

Variabili rimosse Metodo

1 Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGena

. Per blocchi

2 .a Exp2Gen, ExpSqub

Rimuovi

a. Tutte le variabili richieste sono state immesse.

b. Tutte le variabili richieste sono state rimosse.

c. Variabile dipendente: Annual salary in $

Riepilogo del modello

Model

R

R-quadr

ato

R-quadrat

o corretto

Deviazione standard

Errore della stima

Variazione dell'adattamento

Variazione di R-

quadrato

Variazione di

F df1 df2

Sig. Variazione di

F

1 ,875° ,766 ,754 12507,735 ,766 61,673 5 94 ,000

2 ,868b ,754 ,746 12700,080 -,012 2,488 2 94 ,089a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen

b. Predittori: (Costante), Gender, Years of Experience, ExpGen

F-statistic F p-value

Do NOT reject H0: β4 = β5 = 0, i.e. Model 3 is better

Page 54: 2 - Multiple Regression Models

A quadratic model example: Shipping costs

– Y : cost of shipment in dollars

– X1: package weight in pounds

– X2: distance shipped in miles

Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12

+ β5x22

Data: Express.sav

Although a regional delivery service bases the charge for shipping a package on the package weight and distance shipped,

its profit per package depends on the package size (volume of space it occupies) and the size and nature of the delivery truck.

The company conducted a study to investigate the relationship between the cost of shipment and the variables

that control the shipping charge: weight and distance.

It is suspected that non linear effect may be present

Page 55: 2 - Multiple Regression Models

Scatter plots

0.00 2.00 4.00 6.00 8.00

Weight of parcel in lbs.

4.0

8.0

12.0

16.0

Co

st o

f sh

ipm

ent

50 100 150 200 250

Distance shipped

4.0

8.0

12.0

16.0

Co

st o

f sh

ipm

ent

Scatter plots in multiple regression often do not show too much information

Page 56: 2 - Multiple Regression Models

Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 +

β5x22

Model Summary

.997a .994 .992 .4428Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Weight*Distance, Distancesquared, Weight squared, Weight of parcel in lbs.,Distance shipped

a. ANOVAb

449.341 5 89.868 458.388 .000a

2.745 14 .196

452.086 19

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Weight*Distance, Distance squared, Weight squared,Weight of parcel in lbs., Distance shipped

a.

Dependent Variable: Cost of shipmentb.

Coefficientsa

.827 .702 1.178 .259

-.609 .180 -.316 -3.386 .004

.004 .008 .062 .503 .623

.090 .020 .382 4.442 .001

1.51E-005 .000 .075 .672 .513

.007 .001 .850 11.495 .000

(Constant)

Weight of parcel in lbs.

Distance shipped

Weight squared

Distance squared

Weight*Distance

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Cost of shipmenta.

Not significant, try to eliminate Distance squared

Page 57: 2 - Multiple Regression Models

Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12

Model Summary

.997a .994 .992 .4346Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Weight*Distance, Distanceshipped, Weight squared, Weight of parcel in lbs.

a. ANOVAb

449.252 4 112.313 594.623 .000a

2.833 15 .189

452.086 19

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Weight*Distance, Distance shipped, Weight squared,Weight of parcel in lbs.

a.

Dependent Variable: Cost of shipmentb.

Coefficientsa

.475 .458 1.035 .317

-.578 .171 -.300 -3.387 .004

.009 .003 .141 3.421 .004

.087 .019 .369 4.485 .000

.007 .001 .842 11.753 .000

(Constant)

Weight of parcel in lbs.

Distance shipped

Weight squared

Weight*Distance

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Cost of shipmenta.

Page 58: 2 - Multiple Regression Models

Applying the F-test: Shipping costs

– Y : cost of shipment in dollars

– X1: package weight in pounds

– X2: distance shipped in miles

Model 1: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 +

β5x22

Data: Express.sav

A company conducted a study to investigate the relationship between the cost of shipment and the variables that control

the shipping charge: weight and distance.

It is suspected that non linear effect may be present, use the F-test for nested models to

decide between

Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2

Page 59: 2 - Multiple Regression Models

ANOVA Tables

ANOVAb

449.341 5 89.868 458.388 .000a

2.745 14 .196

452.086 19

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Weight*Distance, Distance squared, Weight squared,Weight of parcel in lbs., Distance shipped

a.

Dependent Variable: Cost of shipmentb.

Full model

Reduced model ANOVAb

445.452 3 148.484 358.154 .000a

6.633 16 .415

452.086 19

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Distance shipped, Weight of parcel in lbs., Weight*Distancea.

Dependent Variable: Cost of shipmentb.

Page 60: 2 - Multiple Regression Models

F-statistic

To test H0: β4 = β5 = 0, from the ANOVA tables we have

92.9196.0

2/)745.2633.6(2/)(

C

CR

MSE

SSESSEF

The critical value Fα (at 5% level) for and F-distribution with 2 and 14 d.f. is 3.74

Since F (9.92) > Fα (3.74) the null hypothesis is rejected at the 5% significance level. I.e. the model with quadratic terms is preferred over the reduced one.

Page 61: 2 - Multiple Regression Models

Computer outputVariables Entered/Removedc

Weight*Distance,Distancesquared,Weightsquared,Weight ofparcel inlbs.,Distanceshipped

a

. Enter

.a

Distancesquared,Weightsquared

bRemove

Model1

2

VariablesEntered

VariablesRemoved Method

All requested variables entered.a.

All requested variables removed.b.

Dependent Variable: Cost of shipmentc. Model Summary

.997a .994 .992 .4428 .994 458.388 5 14 .000

.993b .985 .983 .6439 -.009 9.917 2 14 .002

Model1

2

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Predictors: (Constant), Weight*Distance, Distance squared, Weight squared, Weight of parcel in lbs., Distance shippeda.

Predictors: (Constant), Weight*Distance, Weight of parcel in lbs., Distance shippedb.

F-statistic

F p-value

Reject H0: β4 = β5 = 0

Page 62: 2 - Multiple Regression Models

Executive salaries: a final model (?)

• Y = Annual salary (in dollars)

• x1 = Years of experience

• x2 = Years of education

• x3 = Gender : 1 if male; 0 if female

• x4 = Number of employees supervised

• x5 = Corporate assets (in millions of dollars)

Try adding other variables to model 3

E(Y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x3 + β5x4 + β6x5

Model 6

Page 63: 2 - Multiple Regression Models

Computer Output: Model 6Riepilogo del modello

Modello

R R-quadratoR-quadrato

correttoErrore della

stima

1 ,963a ,927 ,922 7020,089a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of

Employees supervised, ExpGender

Anovab

Model Somma dei

quadrati dfMedia dei quadrati F Sig.

1Regressione 5,836E10 6 9,727E9 197,384 ,000a

Residuo 4,583E9 93 4,928E7

Totale 6,295E10 99a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender

Page 64: 2 - Multiple Regression Models

Computer Output: Model 6

CoefficientsModel

Coefficienti non standardizzati

Coefficienti

standardizzati

t Sig.B

Deviazione standard

Errore Beta1

(Costante) -38331,331 9533,238 -4,021 ,000

Years of Experience 2178,964 171,979,634

12,670 ,000

Gender 13203,101 3137,775,249

4,208 ,000

ExpGender 669,546 209,042,233

3,203 ,002

Years of Education 2689,594 311,914,246

8,623 ,000

Number of Employees supervised

53,239 4,470,353

11,910 ,000

Corporate assets (in million $)

180,310 46,600,110

3,869 ,000

a. Variabile dipendente: Annual salary in $

Page 65: 2 - Multiple Regression Models

Executive salaries: comparison of models

Mod.

Predictors Adj. R2 Standard

error

F-stat

1 x1, x2, x4, x5

0.747 12685.31

74.05

2 x1, x3 0.735 12981.62

138.26

3 x1, x3, x1∙x3

0.746 12700.08

98.09

6 x1, x3, x1∙x3, x4, x5

0.922 7020.09 197.38