multiple linear regression...perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 +...
TRANSCRIPT
(Simple) Multiple linear
regression
Multiple regression
β’ One response (dependent) variable:
β Y
β’ More than one predictor (independent
variable) variable:
β X1, X2, X3 etc.
β number of predictors = p
β’ Number of observations = n
Multiple regression - graphical
interpretation
0 1 2 3 4 5 6 7
X1
0
5
10
15
Y
7 8 9 10 11 12
X2
0
5
10
15
Y
Multiple regression graphical explanation.syd
Two possible single variable models:
1) yi = 0 + 1xi1 + I
2) yi = 0 + 2xi2 + i
Which is a better fit?
Multiple regression - graphical
interpretation
Multiple regression graphical explanation.syd
Two possible single variable models:
1) yi = 0 + 1xi1 + I
2) yi = 0 + 2xi2 + i
Which is a better fit?
0 1 2 3 4 5 6 7
X1
0
5
10
15
Y
7 8 9 10 11 12
X2
0
5
10
15
Y
P=0.02
r2=0.67
P=0.61
r2=0.00
Multiple regression - graphical
interpretation
Multiple regression graphical explanation.syd
Perhaps a multiple regression model work fit better:
yi = 0 + 1xi1 + 2xi2 +i
0 1 2 3 4 5 6 7 X1
0
5
10
15
Y
1 3
2
4
5
6
7 8 9 10 11 12 X2
0
5
10
15
Y
X1 Y expected residual X2
1 4 3.02 0.98 11.5
2 3 4.58 -1.58 9.25
3 5 6.14 -1.14 9.25
4 9 7.7 1.3 11.2
5 11.5 9.26 2.24 11.9
6 9 10.82 -1.82 8
residual
π¦π = π0 + π1π₯π1
Multiple regression - graphical
interpretation
Multiple regression graphical explanation.syd
Perhaps a multiple regression model work fit better:
yi = 0 + 1xi1 + 2xi2 +i
0 1 2 3 4 5 6 7 X1
0
5
10
15
Y
7 8 9 10 11 12 X2
-2
-1
0
1
2
3
Residual of
X1 Y expected residual X2
1 4 3.02 0.98 11.5
2 3 4.58 -1.58 9.25
3 5 6.14 -1.14 9.25
4 9 7.7 1.3 11.2
5 11.5 9.26 2.24 11.9
6 9 10.82 -1.82 8π¦π = π0 + π1π₯π1
π¦π = π0 + π1π₯π1
Multiple regression - graphical
interpretation Perhaps a multiple regression model work fit better:
yi = 0 + 1xi1 + 2xi2 +I
Estimated by
MULTIPLE REGRESSION EXAMPLE X1 Y X2
π¦π = π0 + π1π₯π1 + π2π₯π2
Multiple regression - statistics
and partial residual plots
Multiple regression 1.syd
X1
X1
Y
X2
X2
X3
X3
X4
X4
Y
y = 0+1x1+2x2+3x3+ 4x4
Overall model
Simple regression results
Multiple regression 1.syd
X1
X1
Y
X2
X2
X3
X3
X4
X4
Y
0.580 y = 0 + 1 x 4
0.0127 y = 0 + 1 x 3
0.366 y = 0 + 1 x 2
<0.00001 y = 0 + 1 x 1
Model
0.580 y = 0 + x
0.0127 y = 0 + x
0.366 y = 0 + 1
<0.00001 y = 0 + x
P - value Model
Multiple regression - statistics
y = 0+1x1+2x2+3x3+ 4x4
P- values
based on
simple
regressions
0.0001
0.366
0.0127
0.580
Multiple regression 1
Akaike (corrected) Information Criterion (Lower is better)
Bayesian Information Criterion (Lower is better)
Multiple regression - partial
residual plots
Multiple regression 1.syd
y = 0+1x1+2x2+3x3+ 4x4
Model Partial residual
y = 0+2x2+3x3+ 4x4 Ypartial(1)
y = 0+1x1+3x3 + 4x4 Ypartial(2)
y = 0+1x1+2x2 + 4x4 Ypartial(3)
y = 0+1x1 +2x2 +3x3 Ypartial(4)
0 50 100 150 200 250 300 350
X1
-200
-100
0
100
200
YP
AR
TIA
L(1
)
-30 -20 -10 0 10 20 30
X2
-30
-20
-10
0
10
20
30
YP
AR
TIA
L(2
)
-15 -10 -5 0 5 10 15
X3
-10
-5
0
5
10
15
YP
AR
TIA
L(3
)
0 10 20 30 40 50 60 70 80 90 100
X4
-3
-2
-1
0
1
2
3
YP
AR
TIA
L(4
)
0 50 100 150 200 250 300 350
X1
0
100
200
300
400
Y
-30 -20 -10 0 10 20 30
X2
0
100
200
300
400
Y
-15 -10 -5 0 5 10 15
X3
0
100
200
300
400
Y
0 10 20 30 40 50 60 70 80 90 100
X4
0
100
200
300
400
Y
Partial residuals vs Xi
Raw data (Y) vs Xi
Ypartial(4)y = 0+1x1 +2x2 +3x3
Ypartial(3)y = 0+1x1+2x2 + 4x4
Ypartial(2)y = 0+1x1+3x3 + 4x4
Ypartial(1)y = 0+2x2+3x3+ 4x4
Partial residualModel
Ypartial(4)y = 0+1x1 +2x2 +3x3
Ypartial(3)y = 0+1x1+2x2 + 4x4
Ypartial(2)y = 0+1x1+3x3 + 4x4
Ypartial(1)y = 0+2x2+3x3+ 4x4
Partial residualModel
Regression models
Linear model:
yi = 0 + 1xi1 + 2xi2 + .... + i
Sample equation:
. .. y b b x b x i = + + +
0 1 i1 2 i2
Partial regression coefficients
β’ H0: 1 = 0
β’ Partial population regression coefficient
(slope) for Y on X1, holding all other Xβs
constant, equals zero
β’ Example: assume Y = bird abundance,
X1=Patch Area and X2=Year
β slope of regression of Y against patch area,
holding years constant, equals 0.
Multiple regression plane
Bird
Ab
un
da
nce
Years Patch Area
Testing H0: i = 0
β’ Use partial t-tests:
β’ t = bi / SEbi
β’ Compare with t-distribution with n-2 df
β’ Separate t-test for each partial regression coefficient in model
β’ Usual logic of t-tests:
β reject H0 if P < 0.05 (again this is convention β donβt feel tied to this)
Overall regression model
β’ H0: 1 = 2 = ... = 0 (all population
slopes equal zero).
β’ Test of whether overall regression
equation is significant.
β’ Use ANOVA F-test:
β Variation explained by regression
β Unexplained (residual) variation
Assumptions
β’ Normality and homogeneity of variance for response variable (previously discussed)
β’ Independence of observations (previously discussed)
β’ Linearity (previously discussed)
β’ No collinearity (big deal in multiple regression)
Collinearity
β’ Collinearity:
β predictors correlated
β’ Assumption of no collinearity:
β predictor variables uncorrelated with (ie. independent of) each other
β’ Effect of collinearity:
β estimates of is and significance tests unreliable
Checks for collinearity
β’ Correlation matrix and/or SPLOM between predictors
β’ Tolerance for each predictor: β 1-r2 for regression of that predictor on all others β if tolerance is low (near 0.1) then collinearity is a
problem
β’ VIF values β 1/tolerance β (variance inflator function) β look for large values
(>10)
β’ Condition indices (not in JMP β Pro) β Greater than 15 β be cautious β Greater than 30 β a serious problem
β’ Look at all indicators to determine extent of colinearity
Scatterplots β’ Scatterplot matrix (SPLOM)
β pairwise plots for all variables
β’ Example: build a multiple regression model to predict total
employment using values of six independent variables. See
Longley.syd
β MODEL total = CONSTANT + deflator + gnp + unemployment +
armforce + population + time DEFLATOR
DE
FLA
TO
R
GNP UNEMPLOY ARMFORCE POPULATN TIME
DE
FLA
TO
R
GN
P
GN
P
UN
EM
PLO
Y
UN
EM
PLO
Y
AR
MF
OR
CE
AR
MF
OR
CE
PO
PU
LA
TN
PO
PU
LA
TN
DEFLATOR
TIM
E
GNP UNEMPLOY ARMFORCE POPULATN TIME
TIM
E
Look at relationship between
predictor variables β
immediately you can see
colinearity problems
Checks for collinearity
β’ Correlation matrix and/or SPLOM between predictors
β’ Tolerance for each predictor: β 1-r2 for regression of that predictor on all others β if tolerance is low (near 0.1) then collinearity is a
problem
β’ VIF values β 1/tolerance β (variance inflator function) β look for large values
(>10)
β’ Condition indices β Greater than 15 β be cautious β Greater than 30 β a serious problem
β’ Look at all indicators to determine extent of colinearity
Condition indices
1 2 3 4 5
1.00000 9.14172 12.25574 25.33661 230.42395
6 7
1048.08030 43275.04738
Dependent Variable Β¦ TOTAL
N Β¦ 16
Multiple R Β¦ 0.998
Squared Multiple R Β¦ 0.995
Adjusted Squared Multiple R Β¦ 0.992
Standard Error of Estimate Β¦ 304.854
Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)
CONSTANT -3.48226E+06 8.90420E+05 0.00000 . -3.91080 0.00356
DEFLATOR 15.06187 84.91493 0.04628 0.00738 0.17738 0.86314
GNP -0.03582 0.03349 -1.01375 0.00056 -1.06952 0.31268
UNEMPLOY -2.02023 0.48840 -0.53754 0.02975 -4.13643 0.00254
ARMFORCE -1.03323 0.21427 -0.20474 0.27863 -4.82199 0.00094
POPULATN -0.05110 0.22607 -0.10122 0.00251 -0.22605 0.82621
TIME 1829.15146 455.47850 2.47966 0.00132 4.01589 0.00304
Tolerance and Condition Indices
Longley.syz
Variance Inflator Function (VIF)
Confidence Interval for Regression Coefficients
Β¦ 95.0% Confidence Interval
Effect Β¦ Coefficient Lower Upper VIF
---------+----------------------------------------------------------------
CONSTANT Β¦ -3.482259E+006 -5.496529E+006 -1.467988E+006 .
DEFLATOR Β¦ 15.061872 -177.029036 207.152780 135.532438
GNP Β¦ -0.035819 -0.111581 0.039943 1,788.513483
UNEMPLOY Β¦ -2.020230 -3.125067 -0.915393 33.618891
ARMFORCE Β¦ -1.033227 -1.517949 -0.548505 3.588930
POPULATN Β¦ -0.051104 -0.562517 0.460309 399.151022
TIME Β¦ 1,829.151465 798.787513 2,859.515416 758.980597
Solutions to collinearity
β’ Simplest - Drop redundant (correlated)
predictors
β’ Principal components regression
β potentially useful
Best model?
β’ Model that best fits the data with fewest predictors
β’ Criteria for comparing fit of different models:
β r2 generally unsuitable
β adjusted r2 better
β Mallowβs Cp better
β AIC Best β lower values indicate better fit
Explained variance
r2
proportion of variation in Y explained
by linear relationship with X1, X2 etc.
SS Regression
SS Total
Screening models
β’ All subsets β recommended
β many models if many predictors ( a big problem)
β’ Automated stepwise selection: β forward, backward, stepwise
β NOT recommended unless you get the same model both ways
β’ Check AIC values
β’ Hierarchical partitioning β contribution of each predictor to r2
Model comparison (simple
version)
β’ Fit full model:
β y = 0+1x1+2x2+3x3+β¦
β’ Fit reduced models (e.g.):
β y = 0+2x2+3x3+β¦
β’ Compare
Multiple regression 1
X1
X1
X2 X3 X4 Y
X1
X2
X2
X3
X3
X4
X4
X1
Y
X2 X3 X4 Y
Y
y = 0+1x1+2x2+3x3+ 4x4
Any evidence of
Colinearity?
Model Building
Again check for colinearity
Compare Models using AIC
β’ Model 1:
β AIC 78.67
β Corrected AIC 85.67
β’ Model 2
β AIC 77.06
β Corrected AIC 81.67
y = 0+1x1+2x2+3x3+ 4x4
y = 0+1x1+2x2+3x3
Formally: Akaike information
criterion (AIC, AICc)
Sometimes the following equation is used: AIC = 2k + n[ln(RSS/n)]
where,
k = number of fitted parameters
n = number of observations
πΏ2 = residual sum of squares (RSS) / π
AICc = corrected for small sample size
Lower score means better fit
π ln πΏ22π + 1 + 2(π + 1)
π ln πΏ22π + 1 + 2(π + 1) +(2(π + 1))(π+2
πβπβ2)
AIC:
AICc:
Model Selection
How important is each predictor variable to
the model?
Compare models β sequential sum of squares
Model Adjusted r2
y = 0+1x1+2x2+3x3+ 4x4
y = 0+1x1+2x2+3x3
y = 0+1x1+2x2
y = 0+1x1
y = 0+1x1+2x2+3x3+ 4x4
For reference the output from the full model
y = 0+1x1+2x2+3x3+ 4x4
For reference the output from the full model
Compare models β sequential sum of squares
0.96844
0.02743
0.00387
- 0.00001
Contribution to
Model r 2
0.96844 y = 0 +
1 x 1
0.99587 y = 0 +
1 x 1 + 2 x 2
0.99974 y = 0 +
1 x 1 + 2 x 2 +
3 x 3
0.99973 y = 0 +
1 x 1 + 2 x 2 +
3 x 3 + 4 x 4
Adjusted r 2 Model
0.96844
0.02743
0.00387
- 0.00001
Contribution to
Model r 2
0.96844 y = 0 +
1 x 1
0.99587 y = 0 +
1 x 1 + 2 x 2
0.99974 y = 0 +
1 x 1 + 2 x 2 +
3 x 3
0.99973 y = 0 +
1 x 1 + 2 x 2 +
3 x 3 + 4 x 4
Adjusted r 2 Model
(Simple) Non-linear regression
models
Non-linear regression
β’ Use when you cannot easily linearize a relationship (that is clearly non-linear)
β’ One response (dependent) variable:
β Y
β’ One predictor (independent variable) variable:
β X1
β’ Non-linear functions (of many types)
Regression models
Linear model:
yi = 0 + 1x1 +
Non - Linear model (one of many possible):
yi = 0 + 1x1
2 +
Non-linear regression
β’ What is the hypothesis??
β This is a very big question- lets come back to this
β’ What does r2 mean??
β In linear regression it is the explained variance divided by total variance
β In non-linear it is the same but variance explained can be calculated in two ways
β’ Based on
β’ Based on
2Λ
iy
2)Λ( yyi
Raw r2
Mean corrected r2
Non-linear regression
β’ What is the hypothesis??
0 4 8 12 16
X
0
10
20
30
40
50
60
Y
Non-linear regression (for example)
b*Exp(c*x) a y + =
What are the
hypotheses?
Non-linear regression (many models
might be adequate)
What are the
hypotheses?
Exponential 2p: Y = a*Exp(b*X)
Exponential 3p: Y = a+b*Exp(c*X)
Polynomial cubic: Y = a+b*X+c*X2+d*X3
What are the hypotheses?
Exponential 2p: Y = a*Exp(b*X)
Exponential 3p: Y = a+b*Exp(c*X)
Polynomial cubic: Y = a+b*X+c*X2+d*X3
a
b
c
a
b
a
b
c
d
Comparing regression Models
β’ Evaluate assumptions - sometimes (like in the examples here) there are violations
β’ Simple (but not always correct) - compare adjusted r2
β’ Problem: what counts??
β Particularly problematic when there are differences in number of estimated parameters
β’ One solution: compared added fit to expected added fit (because of increased numbers of parameters)
β One major restriction: models that are βnestedβ are easier to compare
β Means that the general form is the same or can be made the same simply by modifying parameter values
Non-linear regression (many models
might be adequate)
What are the
hypotheses?
Exponential 2p: Y = a*Exp(b*X)
Exponential 3p: Y = a+b*Exp(c*X)
Polynomial cubic: Y = a+b*X+c*X2+d*X3
Multiple and Non-Linear
Regression
β’ Be careful!
β’ Know what your hypotheses are
β’ Understand how to build models to test your
hypotheses
β’ Understand statistical output β you may be
mislead if you donβt