(simple) multiple linear regression and nonlinear models handouts... · (simple) multiple linear...

(Simple) Multiple linear regression and Nonlinear models

Multiple regression

• One response (dependent) variable:– Y

• More than one predictor (independent variable) variable:– X1, X2, X3 etc.

– number of predictors = p

• Number of observations = n

Multiple regression - graphical interpretation

0 1 2 3 4 5 6 7X1

0

5

10

15

Y

7 8 9 10 11 12X2

0

5

10

15

Y

Multiple regression graphical explanation.syd

Two possible single variable models:1) yi = 0 + 1xi1 + I

2) yi = 0 + 2xi2 + i

Which is a better fit?



Two possible single variable models:1) yi = 0 + 1xi1 + I

2) yi = 0 + 2xi2 + i

Which is a better fit?

0 1 2 3 4 5 6 7X1

0

5

10

15

Y

7 8 9 10 11 12X2

0

5

10

15

Y

P=0.02r2=0.67

P=0.61r2=0.00



Perhaps a multiple regression model work fit better:

yi = 0 + 1xi1 + 2xi2 +i

0 1 2 3 4 5 6 7X1

0

5

10

15

Y

1 3

2

4

5

6

7 8 9 10 11 12X2

0

5

10

15

Y

X1 Y expected residual X21 4 3.02 0.98 11.52 3 4.58 -1.58 9.253 5 6.14 -1.14 9.254 9 7.7 1.3 11.25 11.5 9.26 2.24 11.96 9 10.82 -1.82 8

residual

y b b xi 0 1 i1y b b xi 0 1 i1




yi = 0 + 1xi1 + 2xi2 +i

0 1 2 3 4 5 6 7X1

0

5

10

15

Y

7 8 9 10 11 12X2

-2

-1

0

1

2

3y b b xi 0 1 i1y b b xi 0 1 i1

y b b xi 0 1 i1y b b xi 0 1 i1

Residual of



yi = 0 + 1xi1 + 2xi2 +I

Estimated by

y b b x b xi 0 1 i1 2 i2Whole Model

Summary of FitRSquareRSquare AdjRoot Mean Square ErrorMean of ResponseObservations (or Sum Wgts)

0.9994690.9991140.1006616.916667

6

Analysis of Variance

SourceModelErrorC. Total

DF235

Sum ofSquares

57.1779350.030398

57.208333

Mean Square28.58900.0101

F Ratio2821.464Prob > F

<.0001*

Parameter EstimatesTermInterceptX1X2

Estimate-11.220951.81851581.1579816

Std Error0.3455390.0250190.030355

t Ratio-32.4772.6938.15

Prob>|t|<.0001*<.0001*<.0001*

VIF.

1.08107581.0810758

MULTIPLE REGRESSION EXAMPLE X1 Y X2

Multiple regression - statistics and partial residual plots

Multiple regression 1.syd

X1

X1

Y

X2

X2

X3

X3

X4

X4

Y

y = 0+1x1+2x2+3x3+ 4x4

Overall model

Simple regression results


X1

X1

Y

X2

X2

X3

X3

X4

X4

Y

0.580y = 0+1x4

0.0127y = 0+1x3

0.366y = 0+1x2

<0.00001y = 0+1x1

Model

0.580y = 0+ x

0.0127y = 0+ x

0.366y = 0+1

<0.00001y = 0+ x

P - value Model

Multiple regression - statistics

y = 0+1x1+2x2+3x3+ 4x4

P- values based on simple regressions

0.00010.3660.01270.580

Multiple regression 1

Whole Model

Summary of FitRSquareRSquare AdjRoot Mean Square ErrorMean of ResponseObservations (or Sum Wgts)

0.9997890.9997281.629515158.9474

19

AICc85.67214

BIC84.33877

Analysis of Variance

SourceModelErrorC. Total

DF4

1418

Sum ofSquares

175741.7737.17

175778.95

Mean Square43935.4

2.7

F Ratio16546.21Prob > F

<.0001*

Parameter EstimatesTermInterceptX1X2X3X4

Estimate-0.8429131.0060543-1.0536140.9778513-0.007318

Std Error0.9847680.0048290.0283050.0654420.013684

t Ratio-0.86

208.35-37.2214.94-0.53

Prob>|t|0.4064<.0001*<.0001*<.0001*0.6012

Lower 95%-2.95503

0.9956979-1.1143240.8374916-0.036669

Upper 95%1.269205

1.0164106-0.9929051.11821090.0220318

VIF.

1.35369551.14033851.37639171.1322022

Akaike (corrected) Information Criterion (Lower is better)Bayesian Information Criterion (Lower is better)

Multiple regression - partial residual plots


y = 0+1x1+2x2+3x3+ 4x4

Model Partial residual

y = 0+2x2+3x3+ 4x4 Ypartial(1)y = 0+1x1+3x3 + 4x4 Ypartial(2)y = 0+1x1+2x2 + 4x4 Ypartial(3)

y = 0+1x1 +2x2 +3x3 Ypartial(4)

0 50 100 150 200 250 300 350

X1

-200

-100

0

100

200

YP

AR

TIA

L(1

)

-30 -20 -10 0 10 20 30

X2

-30

-20

-10

0

10

20

30

YP

AR

TIA

L(2

)

-15 -10 -5 0 5 10 15

X3

-10

-5

0

5

10

15

YP

AR

TIA

L(3

)

0 10 20 30 40 50 60 70 80 90 100

X4

-3

-2

-1

0

1

2

3

YP

AR

TIA

L(4

)

0 50 100 150 200 250 300 350

X1

0

100

200

300

400

Y

-30 -20 -10 0 10 20 30

X2

0

100

200

300

400

Y

-15 -10 -5 0 5 10 15

X3

0

100

200

300

400

Y

0 10 20 30 40 50 60 70 80 90 100

X4

0

100

200

300

400

Y

Partial residuals vs Xi

Raw data (Y) vs Xi

Ypartial(4)y = 0+1x1 +2x2 +3x3

Ypartial(3)y = 0+1x1+2x2 + 4x4


Ypartial(1)y = 0+2x2+3x3+ 4x4

Partial residualModel

Ypartial(4)y = 0+1x1 +2x2 +3x3



Ypartial(1)y = 0+2x2+3x3+ 4x4

Partial residualModel

Regression models

Linear model:

yi = 0 + 1xi1 + 2xi2 + .... + i

Sample equation:

...y b b x b xi 0 1 i1 2 i2

Partial regression coefficients

• H0: 1 = 0

• Partial population regression coefficient (slope) for Y on X1, holding all other X’s constant, equals zero

• Example: assume Y = bird abundance, X1=Patch Area and X2=Year– slope of regression of Y against patch area,

holding years constant, equals 0.

Multiple regression plane

Bird

Abu

ndan

ce

Years Patch Area

Testing H0: i = 0

• Use partial t-tests:• t = bi / SEbi

• Compare with t-distribution with n-2 df• Separate t-test for each partial

regression coefficient in model• Usual logic of t-tests:

– reject H0 if P < 0.05 (again this is convention – don’t feel tied to this)

Overall regression model

• H0: 1 = 2 = ... = 0 (all population slopes equal zero).

• Test of whether overall regression equation is significant.

• Use ANOVA F-test:– Variation explained by regression

– Unexplained (residual) variation

Assumptions

• Normality and homogeneity of variance for response variable (previously discussed)

• Independence of observations (previously discussed)

• Linearity (previously discussed)• No collinearity (big deal in multiple

regression)

Collinearity

• Collinearity:– predictors correlated

• Assumption of no collinearity:– predictor variables uncorrelated with (ie.

independent of) each other

• Effect of collinearity:– estimates of is and significance tests

unreliable

Checks for collinearity• Correlation matrix and/or SPLOM between

predictors• Tolerance for each predictor:

– 1-r2 for regression of that predictor on all others– if tolerance is low (near 0.1) then collinearity is a

problem• VIF values

– 1/tolerance – (variance inflator function) – look for large values

(>10)• Condition indices (not in JMP – Pro)

– Greater than 15 – be cautious– Greater than 30 – a serious problem

• Look at all indicators to determine extent of colinearity

Scatterplots• Scatterplot matrix (SPLOM)

– pairwise plots for all variables

• Example: build a multiple regression model to predict total employment using values of six independent variables. See Longley.syd– MODEL total = CONSTANT + deflator + gnp + unemployment +

armforce + population + timeDEFLATOR

DE

FLA

TO

R

GNP UNEMPLOY ARMFORCE POPULATN TIME

DE

FLA

TO

R

GN

P

GN

P

UN

EM

PLO

Y

UN

EM

PLO

Y

AR

MF

OR

CE

AR

MF

OR

CE

PO

PU

LAT

N

PO

PU

LAT

N

DEFLATOR

TIM

E

GNP UNEMPLOY ARMFORCE POPULATN TIME

TIM

E

Look at relationship between predictor variables –immediately you can see colinearity problems

Checks for collinearity• Correlation matrix and/or SPLOM between

predictors• Tolerance for each predictor:

– 1-r2 for regression of that predictor on all others– if tolerance is low (near 0.1) then collinearity is a

problem• VIF values

– 1/tolerance – (variance inflator function) – look for large values

(>10)• Condition indices

– Greater than 15 – be cautious– Greater than 30 – a serious problem

• Look at all indicators to determine extent of colinearity

Condition indices

1 2 3 4 5

1.00000 9.14172 12.25574 25.33661 230.42395

6 7

1048.08030 43275.04738

Dependent Variable ¦ TOTAL N ¦ 16 Multiple R ¦ 0.998 Squared Multiple R ¦ 0.995 Adjusted Squared Multiple R ¦ 0.992 Standard Error of Estimate ¦ 304.854

Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)

CONSTANT -3.48226E+06 8.90420E+05 0.00000 . -3.91080 0.00356

DEFLATOR 15.06187 84.91493 0.04628 0.00738 0.17738 0.86314

GNP -0.03582 0.03349 -1.01375 0.00056 -1.06952 0.31268

UNEMPLOY -2.02023 0.48840 -0.53754 0.02975 -4.13643 0.00254

ARMFORCE -1.03323 0.21427 -0.20474 0.27863 -4.82199 0.00094

POPULATN -0.05110 0.22607 -0.10122 0.00251 -0.22605 0.82621

TIME 1829.15146 455.47850 2.47966 0.00132 4.01589 0.00304

Tolerance and Condition Indices

Longley.syz

Variance Inflator Function (VIF)

Confidence Interval for Regression Coefficients

¦ 95.0% Confidence Interval

Effect ¦ Coefficient Lower Upper VIF

---------+----------------------------------------------------------------

CONSTANT ¦ -3.482259E+006 -5.496529E+006 -1.467988E+006 .

DEFLATOR ¦ 15.061872 -177.029036 207.152780 135.532438

GNP ¦ -0.035819 -0.111581 0.039943 1,788.513483

UNEMPLOY ¦ -2.020230 -3.125067 -0.915393 33.618891

ARMFORCE ¦ -1.033227 -1.517949 -0.548505 3.588930

POPULATN ¦ -0.051104 -0.562517 0.460309 399.151022

TIME ¦ 1,829.151465 798.787513 2,859.515416 758.980597

Solutions to collinearity

• Simplest - Drop redundant (correlated) predictors

• Principal components regression– potentially useful

Best model?

• Model that best fits the data with fewest predictors

• Criteria for comparing fit of different models:– r2 generally unsuitable– adjusted r2 better– Mallow’s Cp better– AIC Best – lower values indicate better fit

Explained variance

r2

proportion of variation in Y explained by linear relationship with X1, X2 etc.

SS RegressionSS Total

Screening models

• All subsets– recommended– many models if many predictors ( a big problem)

• Automated stepwise selection:– forward, backward, stepwise– NOT recommended unless you get the same

model both ways• Check AIC values• Hierarchical partitioning

– contribution of each predictor to r2

Model comparison (simple version)

• Fit full model:– y = 0+1x1+2x2+3x3+…

• Fit reduced models (e.g.):– y = 0+2x2+3x3+…

• Compare

Multiple regression 1

X1

X1

X2 X3 X4 Y

X1

X2 X

2

X3 X

3

X4

X4

X1

Y

X2 X3 X4 Y

Y

y = 0+1x1+2x2+3x3+ 4x4

Any evidence of Colinearity?

Model Building

Again check for colinearity

Compare Models using AIC

• Model 1:

– AIC 78.67– Corrected AIC 85.67

• Model 2

– AIC 77.06– Corrected AIC 81.67

y = 0+1x1+2x2+3x3+ 4x4

y = 0+1x1+2x2+3x3

Formally: Akaike information criterion (AIC, AICc)

Sometimes the following equation is used: AIC = 2k + n[ln(RSS/n)]

where, k = number of fitted parametersn = number of observations

= residual sum of squares (RSS) / AICc = corrected for small sample sizeLower score means better fit

ln 2 1 2 1

ln 2 1 2 1 2 1AIC:AICc:

Model Selection

All Possible Models

Ordered up to best 4 models up to 4 terms per model.

ModelX1X3X2X4X1,X2X1,X3X1,X4X3,X4X1,X2,X3X1,X2,X4X1,X3,X4X2,X3,X4X1,X2,X3,X4

Number1111222233334

RSquare0.97020.31340.04820.01840.99630.97670.97180.33460.99980.99640.97890.34400.9998

RMSE17.556184.260999.2053100.7486.3536

16.012117.591385.49731.59036.4809

15.740187.67651.6295

AICc168.292227.895234.100234.686131.774166.899170.473230.55481.6718135.060168.780234.04285.6721

BIC169.525229.129235.333235.919132.695167.819171.394231.47581.7786135.167168.887234.14984.3388

How important is each predictor variable to the model?

Compare models – sequential sum of squares

Model Adjusted r2

y = 0+1x1+2x2+3x3+ 4x4

y = 0+1x1+2x2+3x3

y = 0+1x1+2x2

y = 0+1x1

y = 0+1x1+2x2+3x3+ 4x4

For reference the output from the full model

y = 0+1x1+2x2+3x3+ 4x4

For reference the output from the full model

Compare models – sequential sum of squares

0.96844

0.02743

0.00387

-0.00001

Contribution to Model r2

0.96844y = 0+1x1

0.99587y = 0+1x1+2x2

0.99974y = 0+1x1+2x2+3x3

0.99973y = 0+1x1+2x2+3x3+ 4x4

Adjusted r2Model

0.96844

0.02743

0.00387

-0.00001

Contribution to Model r2

0.96844y = 0+1x1

0.99587y = 0+1x1+2x2

0.99974y = 0+1x1+2x2+3x3

0.99973y = 0+1x1+2x2+3x3+ 4x4

Adjusted r2Model

(Simple) Non-linear regression models

Non-linear regression

• Use when you cannot easily linearize a relationship (that is clearly non-linear_

• One response (dependent) variable:– Y

• One predictor (independent variable) variable:– X1

• Non-linear functions (of many types)

Regression models

Linear model:

yi = 0 + 1x1 +

Non - Linear model (one of many possible):

yi = 0 + 1x1

2 +


• What is the hypothesis??– This is a very big question- lets come back to this

• What does r2 mean?? – In linear regression it is the explained variance

divided by total variance– In non-linear it is the same but variance explained

can be calculated in two ways

• Based on

• Based on

2ˆ

iy

2)ˆ( yyi Raw r2

Mean corrected r2


• What is the hypothesis??

0 4 8 12 16X

0

10

20

30

40

50

60

Y

Non-linear regression (for example)

B*Exp(c*x)ay Fit Curve

Model ComparisonModelExponential 3P

AICc81.089952

BIC79.922153

SSE87.897377

MSE7.3247814

RMSE2.7064333

R-Square0.9729491

Plot

0

10

20

30

40

50

0 5 10 15X

Exponential 3P

Parameter EstimatesParameterAsymptoteScaleGrowth Rate

Estimate1.76096131.57943840.2293354

Std Error1.95590910.78590040.032577

Lower 95%-2.072550.039102

0.1654857

Upper 95%5.59447273.11977480.2931851

What are the hypotheses?

Non-linear regression (many models might be adequate)


YExponential 2p: Y = a*Exp(b*X)

Exponential 3p: Y = a+b*Exp(c*X)

Polynomial cubic: Y = a+b*X+c*X2+d*X3


Exponential 2p: Y = a*Exp(b*X)



abc

ab

abcd

Comparing regression Models

• Evaluate assumptions - sometimes (like in the examples here) there are violations

• Simple (but not always correct) - compare adjusted r2

• Problem: what counts??– Particularly problematic when there are differences in

number of estimated parameters• One solution: compared added fit to expected added fit

(because of increased numbers of parameters)– One major restriction: models that are ‘nested’ are

easier to compare– Means that the general form is the same or can be made

the same simply by modifying parameter values

Non-linear regression (many models might be adequate)


Fit Curve

Model ComparisonModelExponential 2PExponential 3PCubic

AICc78.06818281.08995286.847655

AICc Weight0.810952

0.17898890.010059

.2 .4 .6 .8 BIC78.01051579.92215383.72124

SSE92.69032487.89737794.528911

MSE7.13002497.32478148.5935373

RMSE2.67021062.70643332.9314736

R-Square0.971474

0.97294910.9709082

Plot

0

10

20

30

40

50

Y

0 5 10 15X

Exponential 2p: Y = a*Exp(b*X)



Multiple and Non-Linear Regression

• Be careful!

• Know what your hypotheses are

• Understand how to build models to test your hypotheses

• Understand statistical output – you may be mislead if you don’t

(simple) multiple linear regression and nonlinear models handouts... · (simple) multiple linear...

Documents