6338_multicollinearity & autocorrelation

Upload: gyanbitt-kar

Post on 03-Jun-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    1/28

    MULTICOLLINEARITY

    AUTOCORRELATION

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    2/28

    Multicollinearity The theory of causation and multiple causation

    Interdependence between the Independent Variables andvariability of Dependent Variables

    Parsimony and Linear Regression

    Theoretical consistency and Parsimony

    X1

    X2

    X3

    X4

    X5

    Y

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    3/28

    One of the assumptions of the CLRM isthat there is no Multicollinearity

    amongst the explanatory variables. Multicollinearity refers to perfect or

    exact relationship among some or all

    explanatory variables

    Expl.: X1 X2 X*

    2

    10 50 52

    15 75 7518 90 97

    24 120 129

    30 150 152

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    4/28

    X2i= 5X1i & X*2was created by adding 2,

    0, 7, 9 & 2 from, a random number table.

    Here r1.2= 1 & r2.2*= 0.99

    X1& X2show perfect multicollinearity

    X2& X*2 near-perfect multicollinearity The problem of multicollinearity and its

    degree in types of data

    Overlap between the variablesindicates the extent of it as shown in

    the Venn diagram.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    5/28

    Example:

    Y = a + b1x1+ b2x2+ u

    whereY = Consumption Expenditure

    X1= Income & X2= Wealth

    Consumption expenditure depends on income (x1) and

    wealth (x2)

    The estimated equation from a set of data is as follows: = 24.77 + 0.94x10.04x2

    t : (3.66) (1.14) (0.52)R2= 0.96 2= 0.95 F = 92.40The individual coefficients are not significant although F valuesuggests a high degree of association

    There is a wrong sign with x2

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    6/28

    The fact that the Ftest is significant but the tvalues of X1 and X2 are individually

    insignificant means that the two variables areso highly correlated that it is impossible to

    isolate the individual impact of either income

    or wealth on consumption.Let us regress X2on X1

    X2= 7.54 + 10.19 X1

    t= (0.25) (62.04) R20.99This shows near perfect multi-collinearity

    between X2and X1

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    7/28

    Y on X1 Y on X2

    = 24.24 + 0.51X1 = 24.41 + 0.05 X2

    t= (3.81) (14.24) t= (3.55) (13.29)

    R2= 0.96 R2= 0.96

    Wealth has significant impact

    Dropping highly collinear variable has made theother variable significant.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    8/28

    Sources of Multicollinearity

    Data collection method employed:

    Sampling over a limited range of the values taken by

    the regressors in the population

    Constraints on the model or in the population being

    sampled:

    Regression of electricity consumption on income and

    house size. There is a constraint : families with higher

    income may have larger homes.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    9/28

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    10/28

    Practical Consequences of Multicollinearity:

    In cases of near perfect or high multicollinearity one is

    likely to encounter the following consequences:

    1. The OLS estimators have large variances and co-

    variances making precise estimation difficult.

    2. (a) Because of 1 the confidence intervals tend to be

    much wider leading to the acceptance of the

    zero null hypothesis (i.e. the true population

    coefficient is zero) more readily.(b) Because of 1 the t ratios of one or more

    coefficients tend to be statistically insignificant.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    11/28

    Practical Consequences of Multicollinearity:

    3. Although the t ratio(s) of one or morecoefficients is/are statistically insignificant, R2

    the overall measure of goodness of fit, can be

    very high.

    4. The OLS estimators and their S.E.s can be

    sensitive to small changes in the data.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    12/28

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    13/28

    Remedial Measures

    1. A priori information and articulation

    2. Dropping a highly collinear variable

    3. Transformation of Data

    4. Additional information or new data

    5. Identifying the purpose and reducing the

    degree of it. (Or) Simply identifying it if

    the purpose is prediction.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    14/28

    AUTOCORRELATION

    The assumption E(UU)= 2I

    Each u distribution has the same variance(homoscedastic)

    All disturbances are pair wise uncorrelated

    This assumption gives

    E(UiUj) = 0 i j

    This assumption when violated leads to:

    1. Heteroscedasticity

    2. Autocorrelation

    Var u1 Cov (U1U2) . . . Cov (U1, U2) 2 0 . . . 0

    Cov (U2U1) Var V2 . . . Cov (U2, Un) 0 2 . . . 0

    . . . . . . . . . . . . . . . = . . . . . . . . . . . .

    Cov (unU1) Cov(UnU2) . . . (Var Un) 0 0 . . .2

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    15/28

    Covariance is the measure of how much two

    random variables vary together (as distinct from

    variance, which measures how much a single

    variable varies.)

    Covariance between two random variables say X

    and Y is defined as

    Cov (X, Y) = E [(X - )(Y- )]

    Where and are expected values of X and Yrespectively.

    If X and Y are independent their cov. is Zero

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    16/28

    The assumption implies that the disturbance

    term relating to any observation is notinfluenced by the disturbance term relating

    to any other observation.

    For example:1. If we are dealing with quarterly time

    series data involving the regression of the

    following specification. (Time Series Data)

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    17/28

    Output (Q) = f (Labour and Capital Input)

    Q L K UQ 1.1 L1 K1 U1

    Q 1.2 L2 K2 U2

    Q 1.3 L3 K3 U3

    Q 1.4 L4 K4 U4Q 2.1 . . . . . . . . .

    . . . . . . . . . . . .

    . . . . . . . . . . . .

    . . . . . . . . . . . .Q n.4 L4n K4n U4n

    Output is

    affecteddue to

    labour

    strike

    There is no

    reason to believethat this will be

    carried over to

    U4

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    18/28

    2. Let

    Family Consumption Expenditure = f (income)

    (A regression involving Cross Section Data)

    Consumption Expenditure

    of Families

    Income of

    Family

    F 1 I1 U1

    F2 I2 U2

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    Fn In Un

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    19/28

    The effect of an increase of one familys income on

    consumption expenditure is not expected to affect the

    consumption expenditure of another family.The reality:

    1. Distribution caused by strike may affect production

    2.Consumption expenditure of one family mayinfluence that of another family i.e.

    To keep up with the Joneses Demonstration effect

    Autocorrelation is a feature in most time-series data. In cross section data it is referred to as spatial

    autocorrelation.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    20/28

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    21/28

    Therefore, in regression involving time series data

    successive observations are likely to be inter-dependent

    which reflect in a systematic pattern of the ui s.2. Specification bias:

    Excluded variable(s) or incorrect functional form.

    a) When some relevant variables have been excludedfrom the model they will reflect a systematic pattern

    in the ui s.

    b) In case of incorrect functional form i.e. fitting a linear

    function when the true relationship is non-linear (&vice-versa), there will either be over estimation or

    under estimation of the dependent variable which will

    have a systematic impact on Ui s.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    22/28

    Example:

    (Correct) MC = 1+ 2output + 3(output)2+ Ui

    (Incorrect) MC = b1+ b2output + Vi

    Where vi = (output)2 + ui and hence it will catch the

    systematic effect of (output)2on the MC leading to serial

    correlation of uis

    .

    3. Cobweb Phenomenon:

    Supply of many agricultural commodities reflect the so

    called Cobweb-Phenomenon where supply reacts to

    price with lag of one time period because supply

    decisions take time to implement (gestation period).

    Expl.: At the beginning of this yearsplanting of crops farmers

    are influenced by the price prevailing last year.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    23/28

    Suppose at the end of period tprice Ptturns out

    to be lower than Pt-1. Therefore, in period t +1 the

    farmer may decide to produce less than they did inperiod t.

    Such phenomena are known as Cobweb

    Phenomena. And they give a systematic pattern tothe Uis.

    In cases of Household Expenditure, share prices

    etc. such problem arises. In general, when laggedvariable is not included (in many cases) the uisare

    correlated.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    24/28

    4. Manipulation of time series data:

    (i) Extrapolation of values of variables likepopulation give rise to serial dependence of

    successive Uis.

    (ii) Very often we use projected population figure

    to arrive at per capita figure for any macro-

    variable and use of such figures in forecasting

    using regression (the successive Uisare serially

    correlated).

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    25/28

    Consequences: (Proofs are not given)

    In the presence of autocorrelation in a model:

    a) Residual variance is likely to under estimate

    the true 2.

    b) R2

    is likely to be over estimated.c) ttest are not valid and if applied likely to give

    misleading conclusions.

    OLS estimators although linear and unbiased, theydo not have minimum variance leading to invalid

    tand Ftest.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    26/28

    Detection of autocorrelation:

    The assumption of the CLRM relates to the population

    disturbance term which are not directly observable.Therefore, their proxies i.e. isare obtained from OLS

    and examined for the presence/absence of auto

    correlation.

    There are various methods. Some of them are:

    1. Graphical Method

    2. Runs Test (a non-parametric test)Examines the

    signs3. DW-statistics

    A decision rule is applied.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    27/28

    Remedial Measures:

    Data transformation by

    a) First difference method (Xt+1 Xt) (one degreeof freedom is lost)

    b) transformation

    Estimated

    The transformed model becomes

    (Yt- Yt-1) = 1(1-)+2(Xt-Xt-1)+ut

    This is known as generalised or quasi-difference

    equation.

  • 8/12/2019 6338_multicollinearity & Autocorrelation

    28/28

    Exercise 4 ( Refer Ch 10&12 of DNG)

    Use time series data in MR

    Find Correlation table

    See the extent of multicollinearity Test for autocorrelation

    In the presence of it use Ro transformation

    Addressing both the problems calculateforecast error and select an equation which

    gives the minimum forecast error.