multiple and complex regression
DESCRIPTION
Multiple and complex regression. Extensions of simple linear regression. Multiple regression models: predictor variables are continuous Analysis of variance: predictor variables are categorical (grouping variables), - PowerPoint PPT PresentationTRANSCRIPT
Multiple and complex regression
Extensions of simple linear regression
• Multiple regression models: predictor variables are continuous
• Analysis of variance: predictor variables are categorical (grouping variables),
• But… general linear models can include both continuous and categorical predictors
Relative abundance of C3 and C4 plants • Paruelo & Lauenroth (1996)
• Geographic distribution and the effects of climate variables on the relative abundance of a number of plant functional types (PFTs): shrubs, forbs, succulents, C3 grasses and C4 grasses.
data
• Relative abundance of PTFs (based on cover, biomass, and primary production) for each site
• Longitude• Latitude• Mean annual temperature• Mean annual precipitation• Winter (%) precipitation• Summer (%) precipitation• Biomes (grassland , shrubland)
73 sites across temperate central North America
Response variable Predictor variables
Relative abundance transformed ln(dat+1) because positively skewed
Histogram of C3
C3
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8
05
1015
2025
30
Histogram of log_10_C3
log_10_C3
Fre
quen
cy
-2.0 -1.5 -1.0 -0.5 0.0
02
46
810
12
Histogram of log_C3
log_C3
Fre
quen
cy
-5 -4 -3 -2 -1 0
02
46
810
12
Histogram of SQRT_C3
SQRT_C3
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8 1.0
02
46
810
12
Collinearity
• Causes computational problems because it makes the determinant of the matrix of X-variables close to zero and matrix inversion basically involves dividing by the determinant (very sensitive to small differences in the numbers)
• Standard errors of the estimated regression slopes are inflated
Detecting collinearlity
• Check tolerance values
• Plot the variables
• Examine a matrix of correlation coefficients between predictor variables
Dealing with collinearity
• Omit predictor variables if they are highly correlated with other predictor variables that remain in the model
Correlations
LAT
95 105 115 5 10 20 0.1 0.3 0.5
3040
50
9510
511
5LONG
MAP
200
600
1000
510
20
MAT
JJAMAP
0.1
0.3
0.5
30 40 50
0.1
0.3
0.5
200 600 1000 0.1 0.3 0.5
DJFMAP
Correlations
1 .097 -.247* -.839** .074 -.065
. .416 .036 .000 .533 .584
73 73 73 73 73 73
.097 1 -.734** -.213 -.492** .771**
.416 . .000 .070 .000 .000
73 73 73 73 73 73
-.247* -.734** 1 .355** .112 -.405**
.036 .000 . .002 .344 .000
73 73 73 73 73 73
-.839** -.213 .355** 1 -.081 .001
.000 .070 .002 . .497 .990
73 73 73 73 73 73
.074 -.492** .112 -.081 1 -.792**
.533 .000 .344 .497 . .000
73 73 73 73 73 73
-.065 .771** -.405** .001 -.792** 1
.584 .000 .000 .990 .000 .
73 73 73 73 73 73
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
LAT
LONG
MAP
MAT
JJAMAP
DJFMAP
LAT LONG MAP MAT JJAMAP DJFMAP
Correlation is significant at the 0.05 level (2-tailed).*.
Correlation is significant at the 0.01 level (2-tailed).**.
Coefficientsa
7.391 3.625 2.039 .045
-.191 .091 -3.095 -2.101 .039 .003 307.745
-.093 .035 -1.824 -2.659 .010 .015 66.784
.002 .001 4.323 2.572 .012 .002 400.939
(Constant)
LAT
LONG
LOXLA
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Tolerance VIF
Collinearity Statistics
Dependent Variable: LC3a.
Coefficientsa
-.553 .027 -20.131 .000
-.003 .004 -.051 -.597 .552 .980 1.020
.048 .006 .783 8.484 .000 .827 1.209
.002 .001 .238 2.572 .012 .820 1.220
(Constant)
LONRE
LATRE
RELALO
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Tolerance VIF
Collinearity Statistics
Dependent Variable: LC3a.
(lnC3)= βo+ β1(lat)+ β2(long)+ β3(latxlong)
After centering both lat and long
Analysis of variance
Source of variation
SS df MS
Regression Σ(yhat-Y)2 p Σ(yhat-Y)2
p
Residual Σ(yobs-yhat)2 n-p-1 Σ(yobs-yhat)2
n-p-1
Total Σ(yobs-Y)2 n-1
Matrix algebra approach to OLS estimation of multiple regression models
• Y=βX+ε
• X’Xb=XY
• b=(X’X) -1 (XY)
Criteria for “best” fitting in multiple regression with p predictors.
Criterion Formula
r2
Adjusted r2
Akaike Information Criteria AIC
Akaike Information Criteria AIC
total
sidual
total
gression
SS
SS
SS
SSr ReRe2 1
)1()
11 2r
pn
n
1
2)]/[ln( Re pn
pnnSSn sidual
121))/(2ln(
22 )Re pn
pnnSS
nsidual
Hierarchical partitioning and model selection
No pred
Model r2 Adjr2 P AIC (R)
1 Lon 0.0006 -0.013 0.84 30.15
1 Lat 0.47 0.46 >0.001 -16.16
2 Lon + Lat 0.48 0.46 >0.001 -15.25
3 Long +Lat +
Lon x Lat0.54 0.52 >0.001 -22.55
R2=0.48
Longitude Latitude
C3
Model Lat + Long
-15 -10 -5 0 5 10 150.0
0.2
0.4
0.6
0.8
1.0
-15-10
-5 0
5 10
15
cLONG
cLA
T
Y_h
ats.
long
lat
-15 -10 -5 0 5 10 15-0.2
0.0
0.2
0.4
0.6
0.8
1.0
-15-10
-5 0
5 10
15
cLONG
cLA
T
Y_h
ats.
long
xlat
-15 -10 -5 0 5 10 150.0
0.2
0.4
0.6
0.8
1.0
-15-10
-5 0
5 10
15
cLAT
cLO
NG
Y_h
ats.
long
lat
-15 -10 -5 0 5 10 15-0.2
0.0
0.2
0.4
0.6
0.8
1.0
-15-10
-5 0
5 10
15
cLAT
cLO
NG
Y_h
ats.
long
xlat
95 100 105 110 115 120
0.0
0.2
0.4
0.6
0.8
1.0
C3 grasses in North America
Longitude
rela
tive
abun
danc
e
35 Lat
45 Lat
Model Lat * Long
The final forward model selection is:
Step: AIC=-228.67SQRT_C3 ~ LAT + MAP + JJAMAP + DJFMAP
Df Sum of Sq RSS AIC<none> 2.7759 -228.67+ LONG 1 0.0209705 2.7549 -227.23+ MAT 1 0.0001829 2.7757 -226.68
Call:lm(formula = SQRT_C3 ~ LAT + MAP + JJAMAP + DJFMAP)
Coefficients:(Intercept) LAT MAP JJAMAP DJFMAP -0.7892663 0.0391180 0.0001538 -0.8573419 -0.7503936
The final backward selection model is
Step: AIC=-229.32SQRT_C3 ~ LAT + JJAMAP + DJFMAP
Df Sum of Sq RSS AIC<none> 2.8279 -229.32- DJFMAP 1 0.26190 3.0898 -224.85- JJAMAP 1 0.31489 3.1428 -223.61- LAT 1 2.82772 5.6556 -180.72
Call:lm(formula = SQRT_C3 ~ LAT + JJAMAP + DJFMAP)
Coefficients:(Intercept) LAT JJAMAP DJFMAP -0.53148 0.03748 -1.02823 -1.05164