multicolline.pdf

Upload: tue-nguyen

Post on 02-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Multicolline.pdf

    1/15

    Multicollinearity in Regression

    Principal Components Analysis

    Standing Heights and Physical Stature

    Attributes Among Female PoliceOfficer ApplicantsS.Q. Lafi and J.B. Kaneene (1992). An Explanation of the Use of Principal Components

    Analysis to Detect and Correct for Multicollinearity, Preventive Veterinary Medicine,

    Vol. 13, pp. 261-275

  • 8/10/2019 Multicolline.pdf

    2/15

    Data Description

    Subjects: 33 Females applying for police officerpositions

    Dependent Variable: Y Standing Height (cm)

    Independent Variables:

    X1 Sing Height (cm) X2 Upper Arm Length (cm)

    X3 Forearm Length (cm)

    X4 Hand Length (cm)

    X5 Upper Leg Length (cm)

    X6 Lower Leg Length (cm)

    X7 Foot Length (inches)

    X8 BRACH (100X3/X2)

    X9 TIBIO (100X6/X5)

  • 8/10/2019 Multicolline.pdf

    3/15

    DataID Y X1 X2 X3 X4 X5 X6 X7 X8 X9

    1 165.8 88.7 31.8 28.1 18.7 40.3 38.9 6.7 88.4 96.5

    2 169.8 90.0 32.4 29.1 18.3 43.3 42.7 6.4 89.8 98.63 170.7 87.7 33.6 29.5 20.7 43.7 41.1 7.2 87.8 94.1

    4 170.9 87.1 31.0 28.2 18.6 43.7 40.6 6.7 91.0 92.9

    5 157.5 81.3 32.1 27.3 17.5 38.1 39.6 6.6 85.0 103.9

    6 165.9 88.2 31.8 29.0 18.6 42.0 40.6 6.5 91.2 96.7

    7 158.7 86.1 30.6 27.8 18.4 40.0 37.0 5.9 90.8 92.5

    8 166.0 88.7 30.2 26.9 17.5 41.6 39.0 5.9 89.1 93.8

    9 158.7 83.7 31.1 27.1 18.1 38.9 37.5 6.1 87.1 96.4

    10 161.5 81.2 32.3 27.8 19.1 42.8 40.1 6.2 86.1 93.7

    11 167.3 88.6 34.8 27.3 18.3 43.1 41.8 7.3 78.4 97.0

    12 167.4 83.2 34.3 30.1 19.2 43.4 42.2 6.8 87.8 97.2

    13 159.2 81.5 31.0 27.3 17.5 39.8 39.6 4.9 88.1 99.5

    14 170.0 87.9 34.2 30.9 19.4 43.1 43.7 6.3 90.4 101.4

    15 166.3 88.3 30.6 28.8 18.3 41.8 41.0 5.9 94.1 98.1

    16 169.0 85.6 32.6 28.8 19.1 42.7 42.0 6.0 88.3 98.4

    17 156.2 81.6 31.0 25.6 17.0 44.2 39.0 5.1 82.6 88.2

    18 159.6 86.6 32.7 25.4 17.7 42.0 37.5 5.0 77.7 89.3

    19 155.0 82.0 30.3 26.6 17.3 37.9 36.1 5.2 87.8 95.3

    20 161.1 84.1 29.5 26.6 17.8 38.6 38.2 5.9 90.2 99.0

    21 170.3 88.1 34.0 29.3 18.2 43.2 41.4 5.9 86.2 95.8

    22 167.8 83.9 32.5 28.6 20.2 43.3 42.9 7.2 88.0 99.1

    23 163.1 88.1 31.7 26.9 18.1 40.1 39.0 5.9 84.9 97.324 165.8 87.0 33.2 26.3 19.5 43.2 40.7 5.9 79.2 94.2

    25 175.4 89.6 35.2 30.1 19.1 45.1 44.5 6.3 85.5 98.7

    26 159.8 85.6 31.5 27.1 19.2 42.3 39.0 5.7 86.0 92.2

    27 166.0 84.9 30.5 28.1 17.8 41.2 43.0 6.1 92.1 104.4

    28 161.2 84.1 32.8 29.2 18.4 42.6 41.1 5.9 89.0 96.5

    29 160.4 84.3 30.5 27.8 16.8 41.0 39.8 6.0 91.1 97.1

    30 164.3 85.0 35.0 27.8 19.0 47.2 42.4 5.0 79.4 89.8

    31 165.5 82.6 36.2 28.6 20.2 45.0 42.3 5.6 79.0 94.0

    32 167.2 85.0 33.6 27.1 19.8 46.0 41.6 5.6 80.7 90.4

    33 167.2 83.4 33.5 29.7 19.4 45.2 44.0 5.2 88.7 97.3

  • 8/10/2019 Multicolline.pdf

    4/15

    Standardizing the Predictors

    *

    22

    1

    * * *

    12 1911 12 19

    * * *

    21 2921 22 29*

    * * *

    91 9233,1 33,2 33,9

    1

    2

    1,...,33; 1,...,9( 1)

    1

    1

    1

    j jij ij

    ijn

    jjij

    i

    n

    j kij ik

    ijk

    jij

    X X X XX i j

    n SX X

    r rX X X

    r rX X X

    r rX X X

    X X X X

    r

    X X

    * *X X 'X R

    2

    1 1

    n n

    kik

    i i

    X X

  • 8/10/2019 Multicolline.pdf

    5/15

  • 8/10/2019 Multicolline.pdf

    6/15

    Variance Inflation Factors (VIFs) VIF measures the extent that a regression coefficients

    variance is inflated due to correlations among the setof predictors

    VIFj = 1/(1-Rj2) where Rj

    2 is the coefficient of multiple

    determination when Xj is regressed on the remainingpredictors.

    Values > 10 are often considered to be problematic

    VIFs can be obtained as the diagonal elements of R-1

    VIFs

    X1 X2 X3 X4 X5 X6 X7 X8 X9

    1.52 436.47 353.99 2.46 817.17 801.94 1.77 417.39 448.37

    Not surprisingly, X2, X3, X5, X6, X8, and X9 are problems (see definitions of X8 and X9)

  • 8/10/2019 Multicolline.pdf

    7/15

    Regression of Y on [1|X*] * *0 1 1 9 9 0i i iE Y X X E *Y 1 X

    Regression Statistics

    Multiple R 0.944825

    R Square 0.892694

    Adjusted R Squar 0.850704

    Standard Error 1.890412

    Observations 33

    ANOVA

    df SS MS F nificance F

    Regression 9 683.7823 75.9758 21.2600 0.0000

    Residual 23 82.1941 3.5737

    Total 32 765.9764

    Coefficient ndard Err t Stat P-value ower 95% pper 95%

    Intercept 164.5636 0.3291 500.0743 0.0000 163.8829 165.2444

    X1* 11.8900 2.3307 5.1015 0.0000 7.0686 16.7114

    X2* 4.2752 39.4941 0.1082 0.9147 -77.4246 85.9751

    X3* -3.2845 35.5676 - 0.0923 0.9272 -76.8616 70.2927

    X4* 4.2764 2.9629 1.4433 0.1624 -1.8528 10.4057

    X5* -9.8372 54.0398 -0.1820 0.8571 -121.6270 101.9525

    X6* 25.5626 53.5337 0.4775 0.6375 -85.1802 136.3055

    X7* 3.3805 2.5166 1.3433 0.1923 -1.8255 8.5865

    X8* 6.3735 38.6215 0.1650 0.8704 -73.5211 86.2682

    X9* -9.6391 40.0289 - 0.2408 0.8118 -92.4453 73.1670

    Note the surprisingnegative coefficients

    for X3*, X5

    *, and X9*

  • 8/10/2019 Multicolline.pdf

    8/15

    Principal Components Analysis

    1

    2

    1

    Using Statistical or Matrix Computer Package, decompose

    the correlation matrix into its eigenvalues and eigenvectors

    ' where eigenvalue and

    j

    pjth

    j j j j

    j

    jp

    p p p

    v

    vj

    v

    * * j

    R

    X 'X R v v ' VLV v

    1

    2

    max

    1

    eigenvector

    0 0

    0 0

    0 0

    subject to: 1 0 Condition Index:

    Principal Components:

    th

    p

    p

    j j

    i j

    j

    p j k

    1 2 p

    j j j k

    *

    V v v v L

    v 'v v 'v

    W = X V

    While the columns of X* are highly correlated, the columns of W are uncorrelated

    The ls represent the variance corresponding to each principal component

  • 8/10/2019 Multicolline.pdf

    9/15

    Police Applicants Height Data - IV

    0.1853 0.1523 0.8017 0.2782 -0.3707 -0.2327 0.1754 -0.0005 0.0104

    0.4413 -0.2348 -0.0986 -0.2312 -0.2551 -0.3191 -0.3973 0.5850 -0.1414

    0.3934 0.3336 -0.1642 0.2336 0.1239 -0.3183 -0.4953 -0.5205 0.1397

    0.4182 -0.0813 0.0284 -0.2063 0.5765 -0.3703 0.5529 0.0009 0.0040

    0.4125 -0.3000 -0.0121 0.3508 0.0559 0.4669 0.0250 0.1487 0.6106

    0.4645 0.1011 -0.2518 0.1658 -0.2697 0.3798 0.2786 -0.1539 -0.6040

    0.2141 0.3577 0.3790 -0.5862 0.2139 0.4811 -0.2484 0.0009 -0.0022-0.0852 0.5467 -0.0498 0.4536 0.3674 0.0367 -0.0418 0.5738 -0.1352

    0.0474 0.5261 -0.3320 -0.2685 -0.4396 -0.1027 0.3445 0.1089 0.4521

    L

    3.6304 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

    0.0000 2.4427 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

    0.0000 0.0000 1.0145 0.0000 0.0000 0.0000 0.0000 0.0000 0.00000.0000 0.0000 0.0000 0.7656 0.0000 0.0000 0.0000 0.0000 0.0000

    0.0000 0.0000 0.0000 0.0000 0.6109 0.0000 0.0000 0.0000 0.0000

    0.0000 0.0000 0.0000 0.0000 0.0000 0.3024 0.0000 0.0000 0.0000

    0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2322 0.0000 0.0000

    0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0009 0.0000

    0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0005

  • 8/10/2019 Multicolline.pdf

    10/15

    Police Applicants Height Data - IIVLV'

    1.0000 0.1441 0.2791 0.1483 0.1863 0.2263 0.3680 0.1147 0.0212

    0.1441 1.0000 0.4708 0.6452 0.7160 0.6617 0.1468 -0.5820 -0.0985

    0.2791 0.4708 1.0000 0.5051 0.3658 0.7284 0.4277 0.4420 0.4406

    0.1483 0.6452 0.5051 1.0000 0.6007 0.5500 0.3471 -0.1911 -0.0988

    0.1863 0.7160 0.3658 0.6007 1.0000 0.7150 -0.0298 -0.3882 -0.4098

    0.2263 0.6617 0.7284 0.5500 0.7150 1.0000 0.2821 0.0026 0.3434

    0.3680 0.1468 0.4277 0.3471 -0.0298 0.2821 1.0000 0.2445 0.39710.1147 -0.5820 0.4420 -0.1911 -0.3882 0.0026 0.2445 1.0000 0.5083

    0.0212 -0.0985 0.4406 -0.0988 -0.4098 0.3434 0.3971 0.5083 1.0000

    R

    1.0000 0.1441 0.2791 0.1483 0.1863 0.2264 0.3680 0.1147 0.0212

    0.1441 1.0000 0.4708 0.6452 0.7160 0.6616 0.1468 -0.5820 -0.0984

    0.2791 0.4708 1.0000 0.5050 0.3658 0.7284 0.4277 0.4420 0.44060.1483 0.6452 0.5050 1.0000 0.6007 0.5500 0.3471 -0.1911 -0.0988

    0.1863 0.7160 0.3658 0.6007 1.0000 0.7150 -0.0298 -0.3882 -0.4099

    0.2264 0.6616 0.7284 0.5500 0.7150 1.0000 0.2821 0.0026 0.3434

    0.3680 0.1468 0.4277 0.3471 -0.0298 0.2821 1.0000 0.2445 0.3971

    0.1147 -0.5820 0.4420 -0.1911 -0.3882 0.0026 0.2445 1.0000 0.5082

    0.0212 -0.0984 0.4406 -0.0988 -0.4099 0.3434 0.3971 0.5082 1.0000

  • 8/10/2019 Multicolline.pdf

    11/15

    Regression of Y on [1|W]

    Regression Statistics

    Multiple R 0.944825

    R Square 0.892694

    Adjusted R Sq 0.850704

    Standard Erro 1.890412

    Observations 33

    ANOVA

    df SS MS F nificance F

    Regression 9 683.7823 75.9758 21.2600 0.0000

    Residual 23 82.1941 3.5737

    Total 32 765.9764

    Coefficient ndard Err t Stat P-value ower 95% pper 95%

    Intercept 164.5636 0.3291 500.0743 0.0000 163.8829 165.2444

    W1 12.1269 0.9922 12.2227 0.0000 10.0744 14.1793W2 4.5224 1.2096 3.7389 0.0011 2.0202 7.0245

    W3 7.6160 1.8769 4.0578 0.0005 3.7334 11.4985

    W4 4.9552 2.1605 2.2935 0.0313 0.4858 9.4246

    W5 -3.5819 2.4185 -1.4810 0.1522 -8.5850 1.4213

    W6 3.2973 3.4376 0.9592 0.3474 -3.8139 10.4085

    W7 6.8268 3.9230 1.7402 0.0952 -1.2885 14.9422

    W8 1.4226 64.0508 0.0222 0.9825 -131.0766 133.9219

    W9 -27.5954 87.0588 -0.3170 0.7541 -207.6903 152.4995

    Note that W8 and

    W9 have very small

    eigenvalues and

    very small

    t-statistics

    Condition indicesare 63.5 and 85.2,

    Both well above 10

    0E

    Y 1 W

  • 8/10/2019 Multicolline.pdf

    12/15

    Reduced Model Removing last 2 principal components due to

    small, insignificant t-statistics and high conditionindices

    Let V(g) be the pg matrix of the eigenvectors for

    the g retained principal components (p=9, g=7) Let W(g) = X

    *V(g) Then regress Y on [1|W(g)]

    V(g)

    0.1853 0.1523 0.8017 0.2782 -0.3707 -0.2327 0.1754

    0.4413 -0.2348 -0.0986 -0.2312 -0.2551 -0.3191 -0.39730.3934 0.3336 -0.1642 0.2336 0.1239 -0.3183 -0.4953

    0.4182 -0.0813 0.0284 -0.2063 0.5765 -0.3703 0.5529

    0.4125 -0.3000 -0.0121 0.3508 0.0559 0.4669 0.0250

    0.4645 0.1011 -0.2518 0.1658 -0.2697 0.3798 0.2786

    0.2141 0.3577 0.3790 -0.5862 0.2139 0.4811 -0.2484

    -0.0852 0.5467 -0.0498 0.4536 0.3674 0.0367 -0.0418

    0.0474 0.5261 -0.3320 -0.2685 -0.4396 -0.1027 0.3445

  • 8/10/2019 Multicolline.pdf

    13/15

    Reduced Regression FitSUMMARY OUTPUT

    Regression Statistics

    Multiple R 0.944575

    R Square 0.892223

    Adjusted R Squar 0.862045

    Standard Error 1.817195

    Observations 33

    ANOVAdf SS MS F nificance F

    Regression 7 683.4215 97.6316 29.5657 0.0000

    Residual 25 82.5549 3.3022

    Total 32 765.9764

    Coefficient ndard Err t Stat P-value ower 95% pper 95

    Intercept 164.5636 0.3163 520.2229 0.0000 163.9121 165.2151

    W1 12.1268 0.9537 12.7151 0.0000 10.1625 14.0910W2 4.5224 1.1627 3.8895 0.0007 2.1277 6.9170

    W3 7.6160 1.8042 4.2213 0.0003 3.9002 11.3317

    W4 4.9551 2.0768 2.3859 0.0249 0.6777 9.2324

    W5 -3.5819 2.3249 -1.5407 0.1360 -8.3701 1.2063

    W6 3.2972 3.3044 0.9978 0.3279 -3.5084 10.1028

    W7 6.8268 3.7711 1.8103 0.0823 -0.9398 14.5934

  • 8/10/2019 Multicolline.pdf

    14/15

    Transforming Back to X-scale

    2 2 's s

    ^ ^ ^

    -1(g) (g) (g) (g)(g) (g) (g) = V V L Vs^2

    3.3022

    gamma-hat(g) beta-hat(g) StdErr

    W1 12.1268 X1* 12.1779 2.0639

    W2 4.5224 X2* -0.4583 2.0549

    W3 7.6160 X3* 1.3113 2.3006

    W4 4.9551 X4* 4.3866 2.8275

    W5 -3.5819 X5* 6.8020 1.7926

    W6 3.2972 X6* 9.1146 1.8993

    W7 6.8268 X7* 3.3197 2.4118

    X8* 1.8268 1.4407

    X9* 2.6829 1.9731

    V{beta-hatg}4.2598 -0.1779 -0.6883 1.0454 -0.8386 -0.0887 -1.8757 -0.4214 0.9289

    -0.1779 4.2228 3.6089 -2.2379 -1.9307 -2.4561 -0.1330 -1.0423 -0.7562

    -0.6883 3.6089 5.2928 -2.3318 -1.3892 -2.9496 -0.3347 1.1128 -2.2031

    1.0454 -2.2379 -2.3318 7.9948 -1.6401 -0.1911 -2.6329 0.1667 1.9223

    -0.8386 -1.9307 -1.3892 -1.6401 3.2135 2.3480 1.4626 0.7180 -1.1223

    -0.0887 -2.4561 -2.9496 -0.1911 2.3480 3.6074 0.1090 -0.1452 1.7520

    -1.8757 -0.1330 -0.3347 -2.6329 1.4626 0.1090 5.8170 -0.1949 -1.7317

    -0.4214 -1.0423 1.1128 0.1667 0.7180 -0.1452 -0.1949 2.0755 -1.2055

    0.9289 -0.7562 -2.2031 1.9223 -1.1223 1.7520 -1.7317 -1.2055 3.8931

  • 8/10/2019 Multicolline.pdf

    15/15

    Comparison of Coefficients and SEs

    Coefficient ndard Err

    Intercept 164.5636 0.3291

    X1* 11.8900 2.3307

    X2* 4.2752 39.4941X3* -3.2845 35.5676

    X4* 4.2764 2.9629

    X5* -9.8372 54.0398

    X6* 25.5626 53.5337

    X7* 3.3805 2.5166

    X8* 6.3735 38.6215

    X9* -9.6391 40.0289

    beta-hat(g) StdErr

    X1* 12.1779 2.0639

    X2* -0.4583 2.0549

    X3* 1.3113 2.3006

    X4* 4.3866 2.8275

    X5* 6.8020 1.7926

    X6* 9.1146 1.8993

    X7* 3.3197 2.4118

    X8* 1.8268 1.4407

    X9* 2.6829 1.9731

    Original ModelPrincipal Components