multiple regress

Upload: palaniappan-sellappan

Post on 03-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Multiple Regress

    1/14

    Multiple Linear Regression

    A method for analyzing the effects of several predictor variables concurrently.

    - Simultaneously

    - Stepwise

    Minimizing the squared deviations from a plane.

    y b x b x b xi i

    1 1 2 2

    We will begin be focusing on Simultaneous Regression.

    The regression coefficients for each predictor is estimated while holding

    the other predictor variables constant. Thus, The slope for a particular

    predictor variable may change with the presence of different co-predictors

    or when used as a solitary predictor.

  • 7/28/2019 Multiple Regress

    2/14

    After testing bivariate assumptions, there may remain multivariate outliers.

    That is, outliers based on a combination of scores. For example being 6 feet tall

    will not make one an outlier, nor will being 120 pounds. Being 6 feet tall and being

    120 lbs will make one an outlier.

    Distance: based on residuals, identifying outliers in the criterion

    Leverage: identifying outliers in the predictors (multivariate)

    Influence: combines distance and leverage to identify unusually influential

    observations. Cooks D measures how much change there would be in the SS

    Slope would occur if that single observation was removed. (Is calculated

    for each observation).

    Tolerance: the degree to which a predictor can be predicted by the other predictors.

    Singular: occurs when a predictor is perfectly predictable from the other predictors.

  • 7/28/2019 Multiple Regress

    3/14

    Standardized regression Coefficients

    This is sometimes related to the question of the relative importance of the predictors.

    Remember, slope is sensitive to units of measurement. Thus, larger unitsproduce smaller values than will the same angle but in smaller units of

    measurement.

    For example, if x is measured in seconds, the value of the slope will be smaller

    than if x were measured in minutes.

    Standardized Coefficients (beta) measure the change in the criterion

    (now measured in standard deviations) that is produced by a one standard

    deviation change in the predictor.

    Determining which predictor is more important is not merely a matter of comparingthe betas, as some textbooks may suggest. There are theoretical and practical matters

    to be considered. Additionally, there is the matter of the variances found in the predictors

    across differing samples, i.e., the standard deviations may change.

  • 7/28/2019 Multiple Regress

    4/14

    Adding predictors may change the regression coefficients and the betas.

    R

    R2

    is the multiple correlation coefficient. It measures the degree of association

    (-1.0 to 1.) between the criterion and the predictor variable taken simultaneously.

    is the coefficient of multiple determination. It indicates the percentage of the

    variance in the criterion variable accounted for by the predictors taken together.

    Adding additional predictor variables will never reduce the coefficient of multiple

    determination, but it doesnt necessarily mean that the added predictors are either

    statistically or theoretically important. (An additional variable may fail to add to

    the coefficient of multiple determination only if it is uncorrelated to the criterion.

    Radj2

    Just as r is adjusted for n, so R2

    or R2

    can be adjusted for thenumber of predictor variables used in the regression.

  • 7/28/2019 Multiple Regress

    5/14

    Standard Error of the Estimated Coefficient

    is a measure of the variability that would be found among the different slopes estimated

    from other samples drawn from the same population. (n held constant)]

    This is analogous to the standard error of the mean and serves an analogous purpose.

    One way to look at the standard error of b is to see it as a measure of how sensitive

    the slope is to a change in a small number of data points from the sample.

  • 7/28/2019 Multiple Regress

    6/14

    A1 A2

    B1 B2

    C1 C2

    In the three panels (A,B,C) the data points are fixed

    From left to right except for the five larger points.

    In the cases of A and B, a few changes in datapoints produced large changes in the slopes.

    This is not the case with C. Why?

    The variability in the x variable is greater

    in the case of C. Furthermore, the correlation

    is greater in case of C. (this latter means thatthe standard error will be smaller.

  • 7/28/2019 Multiple Regress

    7/14

    Standard Error of the Slope

    ss

    s N

    b

    y x

    x

    .

    1

    sy y

    Ny x.

    ( )

    2

    2

    with

  • 7/28/2019 Multiple Regress

    8/14

    In multiple regression we may wish to test a hypothesis concerning all of the

    predictors or some subset of the predictors, in addition to tests of the individual slopes.

    t-tests of individual coefficients, with all other predictors held constant

    F-tests of whether taken together the predictors are a significant predictor of the criterion

    F-tests of whether some subset of predictors is significant

    y b x b x a 1 1 2 2

    Example of strange outcomes.

    t-test of b1 may be non-significant

    t-test of b2 may be non-significant

    F-test of b1, b2 may be significant

    When two predictors are correlated, the standard errors of the coefficients are

    larger than they would be in the absence of the other predictor variable.

    FR

    R

    N k

    k

    2

    21

    1

    (

    ( ) FN p R

    p R

    ( )

    ( )

    1

    1

    2

    2

    Or

    k and p = # of predictors

    df = p(or k), (N-p(or k)-1)

  • 7/28/2019 Multiple Regress

    9/14

    Limitations of test of significance: both of individual predictors and the overall model

    If there are small differences in the betas of the various predictors, different

    patterns of significance may easily arise from another sample. The relative

    variabilities of the predictors may change.

    A significant beta does NOT necessarily mean that the variable is of theoretical or

    of practical importance. The issue of IMPORTANCE is a difficult one. The relative size

    of the betas is not always the solution. (For example, your ability to manipulate a variable

    may be as important an issue in practical terms.)

  • 7/28/2019 Multiple Regress

    10/14

    Difference between two R2

    F R R k kR N k

    l er smaller l er smaller

    l er lareger

    ( ) ( )( )( )arg arg

    arg

    2 2

    21 1

    K = # of variable in the regression

  • 7/28/2019 Multiple Regress

    11/14

    Types of Analysis: Data types

    Cross-sectional: cases represent different objects at one point in time

    Time-Series: same object and variables are tested over time

    - a lagged dependent variable (criterion) (value at previous time) can

    be used as an independent variable (predictor)

    Continuous versus dummy variables

    Dummy variables: categorical, binary, dichotomous (0 and 1).

    There may be more than two categories. For example there may be four categories.This would produce three dummy variables. Let us say that there are four types of

    people: A, B, C, and D. There would be three variables: A (yes/no, 0/1), B(0/1)

    C(0/1) and all zeros would make the fourth category a yes and is reflected in

    the intercept.

  • 7/28/2019 Multiple Regress

    12/14

    Interactions: derived variables

    Between two continuous variables or between one continuous and

    one dummy variable.

    y a b x b x b x x 1 1 2 2 3 1 2

    If x1 is the continuous variable, then b1 tell us its effect on the criterion when x2 = 0.

    B1 + b3 will tell us the effect when x2 =1. B3 tell us the difference in the two slopes.

    For two continuous variable, the additional interaction term will indicate if the

    effect of x1 at low values of x2 is greater or less than its effect at higher values

    of x2.

    Remember, adding additional predictor variables, even interaction terms,can change the betas of all other predictors.

  • 7/28/2019 Multiple Regress

    13/14

    Basic issues

    If you omit a relevant variable(s) from your regression, the betas of the included

    variables will be at best unreliable and at worst invalid.

    If you include irrelevant predictor variable, the betas of the other relevant

    variables remain unbiased but, however, to the extent that this irrelevant variable

    is correlated with some of the other predictors, it will increase the size of the

    Standard Errors (reduce power).

    If the underlying function between one or more of the predictors and the criterionis something other than linear, then the betas will be biased and unreliable. This

    is one reason why it is important to look at all bivariate plots prior to the analysis.

  • 7/28/2019 Multiple Regress

    14/14

    Addressing Collinearilty

    Ideally, you should collect new data that is free of multiple collinearity. This

    usually requires an experimental design (creating true independent variables).

    This is usually not feasible or it would have been done in the first place.

    1. Model Respecification: combining correlated variable through various

    techniques or choosing remove some. (Theoretical & Statistical)

    2. Statistical Variable Selection

    a. Step-wise procedures: can be deceptive and often fails to maximize

    b. Examine all subsets: may reveal subsets with similar

    but the resulting solution may not fit with either the research question

    or the theoretical approach.

    R2