multicollinearity in regression principal components analysis
DESCRIPTION
Multicollinearity in Regression Principal Components Analysis. Standing Heights and Physical Stature Attributes Among Female Police Officer Applicants - PowerPoint PPT PresentationTRANSCRIPT
Multicollinearity in Regression Principal Components Analysis
Standing Heights and Physical Stature Attributes Among Female Police
Officer ApplicantsS.Q. Lafi and J.B. Kaneene (1992). “An Explanation of the Use of Principal Components Analysis to Detect and Correct for Multicollinearity,” Preventive Veterinary Medicine, Vol. 13, pp. 261-275
Data Description
• Subjects: 33 Females applying for police officer positions• Dependent Variable: Y ≡ Standing Height (cm)• Independent Variables:
X1 ≡ Sitting Height (cm) X2 ≡ Upper Arm Length (cm) X3 ≡ Forearm Length (cm) X4 ≡ Hand Length (cm) X5 ≡ Upper Leg Length (cm) X6 ≡ Lower Leg Length (cm) X7 ≡ Foot Length (inches) X8 ≡ BRACH (100X3/X2) X9 ≡ TIBIO (100X6/X5)
DataID Y X1 X2 X3 X4 X5 X6 X7 X8 X91 165.8 88.7 31.8 28.1 18.7 40.3 38.9 6.7 88.4 96.52 169.8 90.0 32.4 29.1 18.3 43.3 42.7 6.4 89.8 98.63 170.7 87.7 33.6 29.5 20.7 43.7 41.1 7.2 87.8 94.14 170.9 87.1 31.0 28.2 18.6 43.7 40.6 6.7 91.0 92.95 157.5 81.3 32.1 27.3 17.5 38.1 39.6 6.6 85.0 103.96 165.9 88.2 31.8 29.0 18.6 42.0 40.6 6.5 91.2 96.77 158.7 86.1 30.6 27.8 18.4 40.0 37.0 5.9 90.8 92.58 166.0 88.7 30.2 26.9 17.5 41.6 39.0 5.9 89.1 93.89 158.7 83.7 31.1 27.1 18.1 38.9 37.5 6.1 87.1 96.410 161.5 81.2 32.3 27.8 19.1 42.8 40.1 6.2 86.1 93.711 167.3 88.6 34.8 27.3 18.3 43.1 41.8 7.3 78.4 97.012 167.4 83.2 34.3 30.1 19.2 43.4 42.2 6.8 87.8 97.213 159.2 81.5 31.0 27.3 17.5 39.8 39.6 4.9 88.1 99.514 170.0 87.9 34.2 30.9 19.4 43.1 43.7 6.3 90.4 101.415 166.3 88.3 30.6 28.8 18.3 41.8 41.0 5.9 94.1 98.116 169.0 85.6 32.6 28.8 19.1 42.7 42.0 6.0 88.3 98.417 156.2 81.6 31.0 25.6 17.0 44.2 39.0 5.1 82.6 88.218 159.6 86.6 32.7 25.4 17.7 42.0 37.5 5.0 77.7 89.319 155.0 82.0 30.3 26.6 17.3 37.9 36.1 5.2 87.8 95.320 161.1 84.1 29.5 26.6 17.8 38.6 38.2 5.9 90.2 99.021 170.3 88.1 34.0 29.3 18.2 43.2 41.4 5.9 86.2 95.822 167.8 83.9 32.5 28.6 20.2 43.3 42.9 7.2 88.0 99.123 163.1 88.1 31.7 26.9 18.1 40.1 39.0 5.9 84.9 97.324 165.8 87.0 33.2 26.3 19.5 43.2 40.7 5.9 79.2 94.225 175.4 89.6 35.2 30.1 19.1 45.1 44.5 6.3 85.5 98.726 159.8 85.6 31.5 27.1 19.2 42.3 39.0 5.7 86.0 92.227 166.0 84.9 30.5 28.1 17.8 41.2 43.0 6.1 92.1 104.428 161.2 84.1 32.8 29.2 18.4 42.6 41.1 5.9 89.0 96.529 160.4 84.3 30.5 27.8 16.8 41.0 39.8 6.0 91.1 97.130 164.3 85.0 35.0 27.8 19.0 47.2 42.4 5.0 79.4 89.831 165.5 82.6 36.2 28.6 20.2 45.0 42.3 5.6 79.0 94.032 167.2 85.0 33.6 27.1 19.8 46.0 41.6 5.6 80.7 90.433 167.2 83.4 33.5 29.7 19.4 45.2 44.0 5.2 88.7 97.3
Standardizing the Predictors
*
22
1
* * *12 1911 12 19
* * *21 2921 22 29*
* * *91 9233,1 33,2 33,9
1
2
1,...,33; 1,...,9( 1)
11
1
j jij ijij n
jjiji
n
j kij iki
jk
jij
X X X XX i j
n SX X
r rX X Xr rX X X
r rX X X
X X X Xr
X X
* *X X 'X R
2
1 1
n n
kiki i
X X
Correlations Matrix of Predictors and InverseR
1.0000 0.1441 0.2791 0.1483 0.1863 0.2264 0.3680 0.1147 0.02120.1441 1.0000 0.4708 0.6452 0.7160 0.6616 0.1468 -0.5820 -0.09840.2791 0.4708 1.0000 0.5050 0.3658 0.7284 0.4277 0.4420 0.44060.1483 0.6452 0.5050 1.0000 0.6007 0.5500 0.3471 -0.1911 -0.09880.1863 0.7160 0.3658 0.6007 1.0000 0.7150 -0.0298 -0.3882 -0.40990.2264 0.6616 0.7284 0.5500 0.7150 1.0000 0.2821 0.0026 0.34340.3680 0.1468 0.4277 0.3471 -0.0298 0.2821 1.0000 0.2445 0.39710.1147 -0.5820 0.4420 -0.1911 -0.3882 0.0026 0.2445 1.0000 0.50820.0212 -0.0984 0.4406 -0.0988 -0.4099 0.3434 0.3971 0.5082 1.0000
R^(-1)1.52 -3.48 3.15 0.41 13.15 -13.28 -0.62 -3.41 10.21-3.48 436.47 -390.31 -1.26 -83.83 77.01 1.18 425.55 -62.663.15 -390.31 353.99 -0.07 91.67 -87.90 -1.25 -382.59 68.230.41 -1.26 -0.07 2.46 4.89 -5.40 -0.81 -0.49 4.5713.15 -83.83 91.67 4.89 817.17 -807.75 -2.21 -76.90 603.81-13.28 77.01 -87.90 -5.40 -807.75 801.94 2.65 71.74 -597.88-0.62 1.18 -1.25 -0.81 -2.21 2.65 1.77 1.12 -2.49-3.41 425.55 -382.59 -0.49 -76.90 71.74 1.12 417.39 -58.2410.21 -62.66 68.23 4.57 603.81 -597.88 -2.49 -58.24 448.37
Variance Inflation Factors (VIFs)• VIF measures the extent that a regression
coefficient’s variance is inflated due to correlations among the set of predictors
• VIFj = 1/(1-Rj2) where Rj
2 is the coefficient of multiple determination when Xj is regressed on the remaining predictors.
• Values > 10 are often considered to be problematic• VIFs can be obtained as the diagonal elements of R-1
VIFsX1 X2 X3 X4 X5 X6 X7 X8 X9
1.52 436.47 353.99 2.46 817.17 801.94 1.77 417.39 448.37
Not surprisingly, X2, X3, X5, X6, X8, and X9 are problems (see definitions of X8 and X9)
Regression of Y on [1|X*] * *
0 1 1 9 9 0i i iE Y X X E *Y 1 X β
Regression StatisticsMultiple R 0.944825R Square 0.892694Adjusted R Square0.850704Standard Error 1.890412Observations 33
ANOVAdf SS MS F Significance F
Regression 9 683.7823 75.9758 21.2600 0.0000Residual 23 82.1941 3.5737Total 32 765.9764
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 164.5636 0.3291 500.0743 0.0000 163.8829 165.2444X1* 11.8900 2.3307 5.1015 0.0000 7.0686 16.7114X2* 4.2752 39.4941 0.1082 0.9147 -77.4246 85.9751X3* -3.2845 35.5676 -0.0923 0.9272 -76.8616 70.2927X4* 4.2764 2.9629 1.4433 0.1624 -1.8528 10.4057X5* -9.8372 54.0398 -0.1820 0.8571 -121.6270 101.9525X6* 25.5626 53.5337 0.4775 0.6375 -85.1802 136.3055X7* 3.3805 2.5166 1.3433 0.1923 -1.8255 8.5865X8* 6.3735 38.6215 0.1650 0.8704 -73.5211 86.2682X9* -9.6391 40.0289 -0.2408 0.8118 -92.4453 73.1670
Note the surprising negative coefficients for X3
*, X5*, and X9
*
Principal Components Analysis
1
2
1
Using Statistical or Matrix Computer Package, decompose the correlation matrix into its eigenvalues and eigenvectors
' where eigenvalue and
j
pjth
j j j jj
jp
p p pvv
j
v
* *j
R
X 'X R v v ' VLV v
1
2
max
1
eigenvector
0 00 0
0 0
subject to: 1 0 Condition Index:
Principal Components:
th
p
p
j ji j
j
p j k
1 2 p
j j j k
*
V v v v L
v 'v v 'v
W = X V
While the columns of X* are highly correlated, the columns of W are uncorrelated The s represent the variance corresponding to each principal component
Police Applicants Height Data - IV
0.1853 0.1523 0.8017 0.2782 -0.3707 -0.2327 0.1754 -0.0005 0.01040.4413 -0.2348 -0.0986 -0.2312 -0.2551 -0.3191 -0.3973 0.5850 -0.14140.3934 0.3336 -0.1642 0.2336 0.1239 -0.3183 -0.4953 -0.5205 0.13970.4182 -0.0813 0.0284 -0.2063 0.5765 -0.3703 0.5529 0.0009 0.00400.4125 -0.3000 -0.0121 0.3508 0.0559 0.4669 0.0250 0.1487 0.61060.4645 0.1011 -0.2518 0.1658 -0.2697 0.3798 0.2786 -0.1539 -0.60400.2141 0.3577 0.3790 -0.5862 0.2139 0.4811 -0.2484 0.0009 -0.0022-0.0852 0.5467 -0.0498 0.4536 0.3674 0.0367 -0.0418 0.5738 -0.13520.0474 0.5261 -0.3320 -0.2685 -0.4396 -0.1027 0.3445 0.1089 0.4521
L3.6304 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.00000.0000 2.4427 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.00000.0000 0.0000 1.0145 0.0000 0.0000 0.0000 0.0000 0.0000 0.00000.0000 0.0000 0.0000 0.7656 0.0000 0.0000 0.0000 0.0000 0.00000.0000 0.0000 0.0000 0.0000 0.6109 0.0000 0.0000 0.0000 0.00000.0000 0.0000 0.0000 0.0000 0.0000 0.3024 0.0000 0.0000 0.00000.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2322 0.0000 0.00000.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0009 0.00000.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0005
Police Applicants Height Data - IIVLV'
1.0000 0.1441 0.2791 0.1483 0.1863 0.2263 0.3680 0.1147 0.02120.1441 1.0000 0.4708 0.6452 0.7160 0.6617 0.1468 -0.5820 -0.09850.2791 0.4708 1.0000 0.5051 0.3658 0.7284 0.4277 0.4420 0.44060.1483 0.6452 0.5051 1.0000 0.6007 0.5500 0.3471 -0.1911 -0.09880.1863 0.7160 0.3658 0.6007 1.0000 0.7150 -0.0298 -0.3882 -0.40980.2263 0.6617 0.7284 0.5500 0.7150 1.0000 0.2821 0.0026 0.34340.3680 0.1468 0.4277 0.3471 -0.0298 0.2821 1.0000 0.2445 0.39710.1147 -0.5820 0.4420 -0.1911 -0.3882 0.0026 0.2445 1.0000 0.50830.0212 -0.0985 0.4406 -0.0988 -0.4098 0.3434 0.3971 0.5083 1.0000
R1.0000 0.1441 0.2791 0.1483 0.1863 0.2264 0.3680 0.1147 0.02120.1441 1.0000 0.4708 0.6452 0.7160 0.6616 0.1468 -0.5820 -0.09840.2791 0.4708 1.0000 0.5050 0.3658 0.7284 0.4277 0.4420 0.44060.1483 0.6452 0.5050 1.0000 0.6007 0.5500 0.3471 -0.1911 -0.09880.1863 0.7160 0.3658 0.6007 1.0000 0.7150 -0.0298 -0.3882 -0.40990.2264 0.6616 0.7284 0.5500 0.7150 1.0000 0.2821 0.0026 0.34340.3680 0.1468 0.4277 0.3471 -0.0298 0.2821 1.0000 0.2445 0.39710.1147 -0.5820 0.4420 -0.1911 -0.3882 0.0026 0.2445 1.0000 0.50820.0212 -0.0984 0.4406 -0.0988 -0.4099 0.3434 0.3971 0.5082 1.0000
Regression of Y on [1|W]
Regression StatisticsMultiple R 0.944825R Square 0.892694Adjusted R Square0.850704Standard Error 1.890412Observations 33
ANOVAdf SS MS F Significance F
Regression 9 683.7823 75.9758 21.2600 0.0000Residual 23 82.1941 3.5737Total 32 765.9764
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 164.5636 0.3291 500.0743 0.0000 163.8829 165.2444W1 12.1269 0.9922 12.2227 0.0000 10.0744 14.1793W2 4.5224 1.2096 3.7389 0.0011 2.0202 7.0245W3 7.6160 1.8769 4.0578 0.0005 3.7334 11.4985W4 4.9552 2.1605 2.2935 0.0313 0.4858 9.4246W5 -3.5819 2.4185 -1.4810 0.1522 -8.5850 1.4213W6 3.2973 3.4376 0.9592 0.3474 -3.8139 10.4085W7 6.8268 3.9230 1.7402 0.0952 -1.2885 14.9422W8 1.4226 64.0508 0.0222 0.9825 -131.0766 133.9219W9 -27.5954 87.0588 -0.3170 0.7541 -207.6903 152.4995
Note that W8 and W9 have very small eigenvalues and very small t-statisticsCondition indices are 63.5 and 85.2,Both well above 10
0E Y 1 Wγ
Reduced Model • Removing last 2 principal components due to small,
insignificant t-statistics and high condition indices• Let V(g) be the p×g matrix of the eigenvectors for the
g retained principal components (p=9, g=7)• Let W(g) = X*V(g)
• Then regress Y on [1|W(g)]
V(g)0.1853 0.1523 0.8017 0.2782 -0.3707 -0.2327 0.17540.4413 -0.2348 -0.0986 -0.2312 -0.2551 -0.3191 -0.39730.3934 0.3336 -0.1642 0.2336 0.1239 -0.3183 -0.49530.4182 -0.0813 0.0284 -0.2063 0.5765 -0.3703 0.55290.4125 -0.3000 -0.0121 0.3508 0.0559 0.4669 0.02500.4645 0.1011 -0.2518 0.1658 -0.2697 0.3798 0.27860.2141 0.3577 0.3790 -0.5862 0.2139 0.4811 -0.2484-0.0852 0.5467 -0.0498 0.4536 0.3674 0.0367 -0.04180.0474 0.5261 -0.3320 -0.2685 -0.4396 -0.1027 0.3445
Reduced Regression FitSUMMARY OUTPUT
Regression StatisticsMultiple R 0.944575R Square 0.892223Adjusted R Square 0.862045Standard Error 1.817195Observations 33
ANOVAdf SS MS F Significance F
Regression 7 683.4215 97.6316 29.5657 0.0000Residual 25 82.5549 3.3022Total 32 765.9764
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 164.5636 0.3163 520.2229 0.0000 163.9121 165.2151W1 12.1268 0.9537 12.7151 0.0000 10.1625 14.0910W2 4.5224 1.1627 3.8895 0.0007 2.1277 6.9170W3 7.6160 1.8042 4.2213 0.0003 3.9002 11.3317W4 4.9551 2.0768 2.3859 0.0249 0.6777 9.2324W5 -3.5819 2.3249 -1.5407 0.1360 -8.3701 1.2063W6 3.2972 3.3044 0.9978 0.3279 -3.5084 10.1028W7 6.8268 3.7711 1.8103 0.0823 -0.9398 14.5934
Transforming Back to X-scale
2 2 's s^ ^ ^
-1(g) (g) (g) (g)(g) (g) (g)β = V γ β V L V
s^23.3022
gamma-hat(g) beta-hat(g) StdErrW1 12.1268 X1* 12.1779 2.0639W2 4.5224 X2* -0.4583 2.0549W3 7.6160 X3* 1.3113 2.3006W4 4.9551 X4* 4.3866 2.8275W5 -3.5819 X5* 6.8020 1.7926W6 3.2972 X6* 9.1146 1.8993W7 6.8268 X7* 3.3197 2.4118
X8* 1.8268 1.4407X9* 2.6829 1.9731
V{beta-hatg}4.2598 -0.1779 -0.6883 1.0454 -0.8386 -0.0887 -1.8757 -0.4214 0.9289
-0.1779 4.2228 3.6089 -2.2379 -1.9307 -2.4561 -0.1330 -1.0423 -0.7562-0.6883 3.6089 5.2928 -2.3318 -1.3892 -2.9496 -0.3347 1.1128 -2.20311.0454 -2.2379 -2.3318 7.9948 -1.6401 -0.1911 -2.6329 0.1667 1.9223
-0.8386 -1.9307 -1.3892 -1.6401 3.2135 2.3480 1.4626 0.7180 -1.1223-0.0887 -2.4561 -2.9496 -0.1911 2.3480 3.6074 0.1090 -0.1452 1.7520-1.8757 -0.1330 -0.3347 -2.6329 1.4626 0.1090 5.8170 -0.1949 -1.7317-0.4214 -1.0423 1.1128 0.1667 0.7180 -0.1452 -0.1949 2.0755 -1.20550.9289 -0.7562 -2.2031 1.9223 -1.1223 1.7520 -1.7317 -1.2055 3.8931
Comparison of Coefficients and SEs
CoefficientsStandard ErrorIntercept 164.5636 0.3291X1* 11.8900 2.3307X2* 4.2752 39.4941X3* -3.2845 35.5676X4* 4.2764 2.9629X5* -9.8372 54.0398X6* 25.5626 53.5337X7* 3.3805 2.5166X8* 6.3735 38.6215X9* -9.6391 40.0289
beta-hat(g) StdErrX1* 12.1779 2.0639X2* -0.4583 2.0549X3* 1.3113 2.3006X4* 4.3866 2.8275X5* 6.8020 1.7926X6* 9.1146 1.8993X7* 3.3197 2.4118X8* 1.8268 1.4407X9* 2.6829 1.9731
Original ModelPrincipal Components