Download - Multicolline.pdf
-
8/10/2019 Multicolline.pdf
1/15
Multicollinearity in Regression
Principal Components Analysis
Standing Heights and Physical Stature
Attributes Among Female PoliceOfficer ApplicantsS.Q. Lafi and J.B. Kaneene (1992). An Explanation of the Use of Principal Components
Analysis to Detect and Correct for Multicollinearity, Preventive Veterinary Medicine,
Vol. 13, pp. 261-275
-
8/10/2019 Multicolline.pdf
2/15
Data Description
Subjects: 33 Females applying for police officerpositions
Dependent Variable: Y Standing Height (cm)
Independent Variables:
X1 Sing Height (cm) X2 Upper Arm Length (cm)
X3 Forearm Length (cm)
X4 Hand Length (cm)
X5 Upper Leg Length (cm)
X6 Lower Leg Length (cm)
X7 Foot Length (inches)
X8 BRACH (100X3/X2)
X9 TIBIO (100X6/X5)
-
8/10/2019 Multicolline.pdf
3/15
DataID Y X1 X2 X3 X4 X5 X6 X7 X8 X9
1 165.8 88.7 31.8 28.1 18.7 40.3 38.9 6.7 88.4 96.5
2 169.8 90.0 32.4 29.1 18.3 43.3 42.7 6.4 89.8 98.63 170.7 87.7 33.6 29.5 20.7 43.7 41.1 7.2 87.8 94.1
4 170.9 87.1 31.0 28.2 18.6 43.7 40.6 6.7 91.0 92.9
5 157.5 81.3 32.1 27.3 17.5 38.1 39.6 6.6 85.0 103.9
6 165.9 88.2 31.8 29.0 18.6 42.0 40.6 6.5 91.2 96.7
7 158.7 86.1 30.6 27.8 18.4 40.0 37.0 5.9 90.8 92.5
8 166.0 88.7 30.2 26.9 17.5 41.6 39.0 5.9 89.1 93.8
9 158.7 83.7 31.1 27.1 18.1 38.9 37.5 6.1 87.1 96.4
10 161.5 81.2 32.3 27.8 19.1 42.8 40.1 6.2 86.1 93.7
11 167.3 88.6 34.8 27.3 18.3 43.1 41.8 7.3 78.4 97.0
12 167.4 83.2 34.3 30.1 19.2 43.4 42.2 6.8 87.8 97.2
13 159.2 81.5 31.0 27.3 17.5 39.8 39.6 4.9 88.1 99.5
14 170.0 87.9 34.2 30.9 19.4 43.1 43.7 6.3 90.4 101.4
15 166.3 88.3 30.6 28.8 18.3 41.8 41.0 5.9 94.1 98.1
16 169.0 85.6 32.6 28.8 19.1 42.7 42.0 6.0 88.3 98.4
17 156.2 81.6 31.0 25.6 17.0 44.2 39.0 5.1 82.6 88.2
18 159.6 86.6 32.7 25.4 17.7 42.0 37.5 5.0 77.7 89.3
19 155.0 82.0 30.3 26.6 17.3 37.9 36.1 5.2 87.8 95.3
20 161.1 84.1 29.5 26.6 17.8 38.6 38.2 5.9 90.2 99.0
21 170.3 88.1 34.0 29.3 18.2 43.2 41.4 5.9 86.2 95.8
22 167.8 83.9 32.5 28.6 20.2 43.3 42.9 7.2 88.0 99.1
23 163.1 88.1 31.7 26.9 18.1 40.1 39.0 5.9 84.9 97.324 165.8 87.0 33.2 26.3 19.5 43.2 40.7 5.9 79.2 94.2
25 175.4 89.6 35.2 30.1 19.1 45.1 44.5 6.3 85.5 98.7
26 159.8 85.6 31.5 27.1 19.2 42.3 39.0 5.7 86.0 92.2
27 166.0 84.9 30.5 28.1 17.8 41.2 43.0 6.1 92.1 104.4
28 161.2 84.1 32.8 29.2 18.4 42.6 41.1 5.9 89.0 96.5
29 160.4 84.3 30.5 27.8 16.8 41.0 39.8 6.0 91.1 97.1
30 164.3 85.0 35.0 27.8 19.0 47.2 42.4 5.0 79.4 89.8
31 165.5 82.6 36.2 28.6 20.2 45.0 42.3 5.6 79.0 94.0
32 167.2 85.0 33.6 27.1 19.8 46.0 41.6 5.6 80.7 90.4
33 167.2 83.4 33.5 29.7 19.4 45.2 44.0 5.2 88.7 97.3
-
8/10/2019 Multicolline.pdf
4/15
Standardizing the Predictors
*
22
1
* * *
12 1911 12 19
* * *
21 2921 22 29*
* * *
91 9233,1 33,2 33,9
1
2
1,...,33; 1,...,9( 1)
1
1
1
j jij ij
ijn
jjij
i
n
j kij ik
ijk
jij
X X X XX i j
n SX X
r rX X X
r rX X X
r rX X X
X X X X
r
X X
* *X X 'X R
2
1 1
n n
kik
i i
X X
-
8/10/2019 Multicolline.pdf
5/15
-
8/10/2019 Multicolline.pdf
6/15
Variance Inflation Factors (VIFs) VIF measures the extent that a regression coefficients
variance is inflated due to correlations among the setof predictors
VIFj = 1/(1-Rj2) where Rj
2 is the coefficient of multiple
determination when Xj is regressed on the remainingpredictors.
Values > 10 are often considered to be problematic
VIFs can be obtained as the diagonal elements of R-1
VIFs
X1 X2 X3 X4 X5 X6 X7 X8 X9
1.52 436.47 353.99 2.46 817.17 801.94 1.77 417.39 448.37
Not surprisingly, X2, X3, X5, X6, X8, and X9 are problems (see definitions of X8 and X9)
-
8/10/2019 Multicolline.pdf
7/15
Regression of Y on [1|X*] * *0 1 1 9 9 0i i iE Y X X E *Y 1 X
Regression Statistics
Multiple R 0.944825
R Square 0.892694
Adjusted R Squar 0.850704
Standard Error 1.890412
Observations 33
ANOVA
df SS MS F nificance F
Regression 9 683.7823 75.9758 21.2600 0.0000
Residual 23 82.1941 3.5737
Total 32 765.9764
Coefficient ndard Err t Stat P-value ower 95% pper 95%
Intercept 164.5636 0.3291 500.0743 0.0000 163.8829 165.2444
X1* 11.8900 2.3307 5.1015 0.0000 7.0686 16.7114
X2* 4.2752 39.4941 0.1082 0.9147 -77.4246 85.9751
X3* -3.2845 35.5676 - 0.0923 0.9272 -76.8616 70.2927
X4* 4.2764 2.9629 1.4433 0.1624 -1.8528 10.4057
X5* -9.8372 54.0398 -0.1820 0.8571 -121.6270 101.9525
X6* 25.5626 53.5337 0.4775 0.6375 -85.1802 136.3055
X7* 3.3805 2.5166 1.3433 0.1923 -1.8255 8.5865
X8* 6.3735 38.6215 0.1650 0.8704 -73.5211 86.2682
X9* -9.6391 40.0289 - 0.2408 0.8118 -92.4453 73.1670
Note the surprisingnegative coefficients
for X3*, X5
*, and X9*
-
8/10/2019 Multicolline.pdf
8/15
Principal Components Analysis
1
2
1
Using Statistical or Matrix Computer Package, decompose
the correlation matrix into its eigenvalues and eigenvectors
' where eigenvalue and
j
pjth
j j j j
j
jp
p p p
v
vj
v
* * j
R
X 'X R v v ' VLV v
1
2
max
1
eigenvector
0 0
0 0
0 0
subject to: 1 0 Condition Index:
Principal Components:
th
p
p
j j
i j
j
p j k
1 2 p
j j j k
*
V v v v L
v 'v v 'v
W = X V
While the columns of X* are highly correlated, the columns of W are uncorrelated
The ls represent the variance corresponding to each principal component
-
8/10/2019 Multicolline.pdf
9/15
Police Applicants Height Data - IV
0.1853 0.1523 0.8017 0.2782 -0.3707 -0.2327 0.1754 -0.0005 0.0104
0.4413 -0.2348 -0.0986 -0.2312 -0.2551 -0.3191 -0.3973 0.5850 -0.1414
0.3934 0.3336 -0.1642 0.2336 0.1239 -0.3183 -0.4953 -0.5205 0.1397
0.4182 -0.0813 0.0284 -0.2063 0.5765 -0.3703 0.5529 0.0009 0.0040
0.4125 -0.3000 -0.0121 0.3508 0.0559 0.4669 0.0250 0.1487 0.6106
0.4645 0.1011 -0.2518 0.1658 -0.2697 0.3798 0.2786 -0.1539 -0.6040
0.2141 0.3577 0.3790 -0.5862 0.2139 0.4811 -0.2484 0.0009 -0.0022-0.0852 0.5467 -0.0498 0.4536 0.3674 0.0367 -0.0418 0.5738 -0.1352
0.0474 0.5261 -0.3320 -0.2685 -0.4396 -0.1027 0.3445 0.1089 0.4521
L
3.6304 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 2.4427 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 1.0145 0.0000 0.0000 0.0000 0.0000 0.0000 0.00000.0000 0.0000 0.0000 0.7656 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.6109 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.3024 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2322 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0009 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0005
-
8/10/2019 Multicolline.pdf
10/15
Police Applicants Height Data - IIVLV'
1.0000 0.1441 0.2791 0.1483 0.1863 0.2263 0.3680 0.1147 0.0212
0.1441 1.0000 0.4708 0.6452 0.7160 0.6617 0.1468 -0.5820 -0.0985
0.2791 0.4708 1.0000 0.5051 0.3658 0.7284 0.4277 0.4420 0.4406
0.1483 0.6452 0.5051 1.0000 0.6007 0.5500 0.3471 -0.1911 -0.0988
0.1863 0.7160 0.3658 0.6007 1.0000 0.7150 -0.0298 -0.3882 -0.4098
0.2263 0.6617 0.7284 0.5500 0.7150 1.0000 0.2821 0.0026 0.3434
0.3680 0.1468 0.4277 0.3471 -0.0298 0.2821 1.0000 0.2445 0.39710.1147 -0.5820 0.4420 -0.1911 -0.3882 0.0026 0.2445 1.0000 0.5083
0.0212 -0.0985 0.4406 -0.0988 -0.4098 0.3434 0.3971 0.5083 1.0000
R
1.0000 0.1441 0.2791 0.1483 0.1863 0.2264 0.3680 0.1147 0.0212
0.1441 1.0000 0.4708 0.6452 0.7160 0.6616 0.1468 -0.5820 -0.0984
0.2791 0.4708 1.0000 0.5050 0.3658 0.7284 0.4277 0.4420 0.44060.1483 0.6452 0.5050 1.0000 0.6007 0.5500 0.3471 -0.1911 -0.0988
0.1863 0.7160 0.3658 0.6007 1.0000 0.7150 -0.0298 -0.3882 -0.4099
0.2264 0.6616 0.7284 0.5500 0.7150 1.0000 0.2821 0.0026 0.3434
0.3680 0.1468 0.4277 0.3471 -0.0298 0.2821 1.0000 0.2445 0.3971
0.1147 -0.5820 0.4420 -0.1911 -0.3882 0.0026 0.2445 1.0000 0.5082
0.0212 -0.0984 0.4406 -0.0988 -0.4099 0.3434 0.3971 0.5082 1.0000
-
8/10/2019 Multicolline.pdf
11/15
Regression of Y on [1|W]
Regression Statistics
Multiple R 0.944825
R Square 0.892694
Adjusted R Sq 0.850704
Standard Erro 1.890412
Observations 33
ANOVA
df SS MS F nificance F
Regression 9 683.7823 75.9758 21.2600 0.0000
Residual 23 82.1941 3.5737
Total 32 765.9764
Coefficient ndard Err t Stat P-value ower 95% pper 95%
Intercept 164.5636 0.3291 500.0743 0.0000 163.8829 165.2444
W1 12.1269 0.9922 12.2227 0.0000 10.0744 14.1793W2 4.5224 1.2096 3.7389 0.0011 2.0202 7.0245
W3 7.6160 1.8769 4.0578 0.0005 3.7334 11.4985
W4 4.9552 2.1605 2.2935 0.0313 0.4858 9.4246
W5 -3.5819 2.4185 -1.4810 0.1522 -8.5850 1.4213
W6 3.2973 3.4376 0.9592 0.3474 -3.8139 10.4085
W7 6.8268 3.9230 1.7402 0.0952 -1.2885 14.9422
W8 1.4226 64.0508 0.0222 0.9825 -131.0766 133.9219
W9 -27.5954 87.0588 -0.3170 0.7541 -207.6903 152.4995
Note that W8 and
W9 have very small
eigenvalues and
very small
t-statistics
Condition indicesare 63.5 and 85.2,
Both well above 10
0E
Y 1 W
-
8/10/2019 Multicolline.pdf
12/15
Reduced Model Removing last 2 principal components due to
small, insignificant t-statistics and high conditionindices
Let V(g) be the pg matrix of the eigenvectors for
the g retained principal components (p=9, g=7) Let W(g) = X
*V(g) Then regress Y on [1|W(g)]
V(g)
0.1853 0.1523 0.8017 0.2782 -0.3707 -0.2327 0.1754
0.4413 -0.2348 -0.0986 -0.2312 -0.2551 -0.3191 -0.39730.3934 0.3336 -0.1642 0.2336 0.1239 -0.3183 -0.4953
0.4182 -0.0813 0.0284 -0.2063 0.5765 -0.3703 0.5529
0.4125 -0.3000 -0.0121 0.3508 0.0559 0.4669 0.0250
0.4645 0.1011 -0.2518 0.1658 -0.2697 0.3798 0.2786
0.2141 0.3577 0.3790 -0.5862 0.2139 0.4811 -0.2484
-0.0852 0.5467 -0.0498 0.4536 0.3674 0.0367 -0.0418
0.0474 0.5261 -0.3320 -0.2685 -0.4396 -0.1027 0.3445
-
8/10/2019 Multicolline.pdf
13/15
Reduced Regression FitSUMMARY OUTPUT
Regression Statistics
Multiple R 0.944575
R Square 0.892223
Adjusted R Squar 0.862045
Standard Error 1.817195
Observations 33
ANOVAdf SS MS F nificance F
Regression 7 683.4215 97.6316 29.5657 0.0000
Residual 25 82.5549 3.3022
Total 32 765.9764
Coefficient ndard Err t Stat P-value ower 95% pper 95
Intercept 164.5636 0.3163 520.2229 0.0000 163.9121 165.2151
W1 12.1268 0.9537 12.7151 0.0000 10.1625 14.0910W2 4.5224 1.1627 3.8895 0.0007 2.1277 6.9170
W3 7.6160 1.8042 4.2213 0.0003 3.9002 11.3317
W4 4.9551 2.0768 2.3859 0.0249 0.6777 9.2324
W5 -3.5819 2.3249 -1.5407 0.1360 -8.3701 1.2063
W6 3.2972 3.3044 0.9978 0.3279 -3.5084 10.1028
W7 6.8268 3.7711 1.8103 0.0823 -0.9398 14.5934
-
8/10/2019 Multicolline.pdf
14/15
Transforming Back to X-scale
2 2 's s
^ ^ ^
-1(g) (g) (g) (g)(g) (g) (g) = V V L Vs^2
3.3022
gamma-hat(g) beta-hat(g) StdErr
W1 12.1268 X1* 12.1779 2.0639
W2 4.5224 X2* -0.4583 2.0549
W3 7.6160 X3* 1.3113 2.3006
W4 4.9551 X4* 4.3866 2.8275
W5 -3.5819 X5* 6.8020 1.7926
W6 3.2972 X6* 9.1146 1.8993
W7 6.8268 X7* 3.3197 2.4118
X8* 1.8268 1.4407
X9* 2.6829 1.9731
V{beta-hatg}4.2598 -0.1779 -0.6883 1.0454 -0.8386 -0.0887 -1.8757 -0.4214 0.9289
-0.1779 4.2228 3.6089 -2.2379 -1.9307 -2.4561 -0.1330 -1.0423 -0.7562
-0.6883 3.6089 5.2928 -2.3318 -1.3892 -2.9496 -0.3347 1.1128 -2.2031
1.0454 -2.2379 -2.3318 7.9948 -1.6401 -0.1911 -2.6329 0.1667 1.9223
-0.8386 -1.9307 -1.3892 -1.6401 3.2135 2.3480 1.4626 0.7180 -1.1223
-0.0887 -2.4561 -2.9496 -0.1911 2.3480 3.6074 0.1090 -0.1452 1.7520
-1.8757 -0.1330 -0.3347 -2.6329 1.4626 0.1090 5.8170 -0.1949 -1.7317
-0.4214 -1.0423 1.1128 0.1667 0.7180 -0.1452 -0.1949 2.0755 -1.2055
0.9289 -0.7562 -2.2031 1.9223 -1.1223 1.7520 -1.7317 -1.2055 3.8931
-
8/10/2019 Multicolline.pdf
15/15
Comparison of Coefficients and SEs
Coefficient ndard Err
Intercept 164.5636 0.3291
X1* 11.8900 2.3307
X2* 4.2752 39.4941X3* -3.2845 35.5676
X4* 4.2764 2.9629
X5* -9.8372 54.0398
X6* 25.5626 53.5337
X7* 3.3805 2.5166
X8* 6.3735 38.6215
X9* -9.6391 40.0289
beta-hat(g) StdErr
X1* 12.1779 2.0639
X2* -0.4583 2.0549
X3* 1.3113 2.3006
X4* 4.3866 2.8275
X5* 6.8020 1.7926
X6* 9.1146 1.8993
X7* 3.3197 2.4118
X8* 1.8268 1.4407
X9* 2.6829 1.9731
Original ModelPrincipal Components