statistics- multiple regression

Upload: dr-singh

Post on 06-Apr-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Statistics- Multiple Regression

    1/23

    2/10/2012

    Multiple Regression: Predicting OneFactor from Several Others

  • 8/3/2019 Statistics- Multiple Regression

    2/23

    Predicting a single Yvariable from two or moreXvariables Describe and Understand the Relationship

    Understand the effect of one Xvariable while holdingthe others fixed

    Forecast (Predict) a New Observation Lets you use all available information (Xvariables) to

    find out about what you dont know (the Yvariable for

    this new situation) Adjust and Control a Process

    because the regression equation (you hope) tells youwhat would happen if you made a change

    2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    3/23

    n cases (elementary units)

    k explanatory Xvariables

    2/10/2012

    Case 1

    Case 2

    .

    .

    .

    Case n

    Y

    (dependentvariable to be

    explained)

    10.9

    23.6

    .

    .

    .

    6.0

    X1

    (first independentor explanatory

    variable)

    2.0

    4.0

    .

    .

    .

    0.5

    Xk

    (last independentor explanatory

    variable)

    12.5

    12.3

    .

    .

    .

    7.0

    .

    .

    .

  • 8/3/2019 Statistics- Multiple Regression

    4/23

    Intercept: a Predicted value for Ywhen every X is 0

    Regression Coefficients: b ,b2, bk The effect of each Xon Y, holding all other X

    variables constant

    Prediction Equation or Regression Equation

    (PredictedY

    ) = a+b1X

    1+b2X

    2++bkX

    k The predicted Y, given the values for all Xvariables

    Prediction Errors or Residuals(Actual Y) (Predicted Y)

    2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    5/23

    Standard Error of Estimate: Se or S Approximate size of errors made predicting Y

    Coefficient of Determination: R2 Percentage of variability in Yexplained by the X

    variables as a group

    FTest: Significant or Not Significant Tests whether the Xvariables, as a group, can

    predict Ybetter than just randomly

    2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    6/23

    t Tests for Individual Regression Coefficients Significant or not significant, for each Xvariable

    Tests whether a particular Xvariable has an effect onY, holding the other

    Xvariables constant

    Should be performed only if the Ftest is significant

    Standard Errors of the Regression Coefficients(with n k 1 degrees of freedom)

    Indicates the estimated sampling standard deviationof each regression coefficient

    Used in the usual way to find confidence intervalsand hypothesis tests for individual regressioncoefficients

    2/10/2012

    kbbbSSS ,,,

    21.

  • 8/3/2019 Statistics- Multiple Regression

    7/23

    Input Data To predict cost of ads from magazine characteristics

    2/10/2012

    Audubon

    Better Homes

    ..

    .

    YM

    Y

    Page Costs

    (color ad)

    $25,315

    198,000

    .

    .

    .

    73,270

    X1Audience

    (thousands)

    1,645

    34,797

    .

    .

    .

    3,109

    X3

    Median

    Income

    $38,787

    41,933

    .

    .

    .

    43,696

    X2Percent

    Male

    51.1

    22.1

    .

    .

    .

    14.4

  • 8/3/2019 Statistics- Multiple Regression

    8/23

    Predicted Page Costs=a+b1X1 +b2X2 +b3X3= $4,043+3.79(Audience) 124(Percent Male)

    +0.903(Median Income)

    Intercept a= $4,043

    Essentially a base rate, representing the cost ofadvertising in a magazine that has no audience, nomale readers, and zero income level

    But thereareno such magazines

    intercept a is merely there to help achieve bestpredictions

    2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    9/23

    Predicted Page Costs

    = a + b1X1 + b2X2 + b3X3= $4,043 + 3.79(Audience) 124(Percent Male)

    + 0.903(Median Income)

    Regression coefficient b1 = 3.79

    Allelseequal: The effect of Audience on Page Costs,while holding Percent Male and Median Incomeconstant

    The effect of Audience on Page Costs, adjusted forPercent Male and Median Income

    On average,Page Costs are estimated to be $3.79 higherfor a magazine with one more (thousand) Audience, ascompared to another magazine with the same PercentMale and Median Income2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    10/23

    Predicted Page Costs

    = a + b1X1 + b2X2 + b3X3= $4,043 + 3.79(Audience) 124(Percent Male)

    + 0.903(Median Income)

    Regression coefficient b2 = 124

    Allelseequal: The effect of Percent Male on Page Costs,while holding Audience and Median Income constant

    The effect of Percent Male on Page Costs, adjusted for

    Audience and Median Income On average,Page Costs are estimated to be $124 lower

    for a magazine with one more percentage point of malereaders, as compared to another magazine with thesame Audience and Median Income

    But dont believe it! We will see that it is not significant2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    11/23

    Predicted Page Costs

    = a + b1X1 + b2X2 + b3X3= $4,043 + 3.79(Audience) 124(Percent Male)

    + 0.903(Median Income)

    Regression coefficient b3 = 0.903

    Allelseequal: The effect of Median Income on PageCosts, while holding Audience and Percent Maleconstant

    The effect of Median Income on Page Costs, adjustedfor Audience and Percent Male

    On average,Page Costs are estimated to be $0.903higher for a magazine with one more dollar of MedianIncome, as compared to another magazine with thesame Audience and Percent Male2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    12/23

    Predicted Page Costs for Audubon= a + b1X1 + b2X2 + b3X3= $4,043 + 3.79(Audience) 124(Percent Male)

    + 0.903(Median Income)= $4,043 + 3.79(1,645) 124(51.1) + 0.903(38,787)

    = $38,966

    Actual Page Costs are $25,315

    Residual is $25,315 38,966 = $13,651 Audubon has Page Costs $13,651 lower than you

    would expect for a magazine with its characteristics(Audience,Percent Male, and Median Income)

    2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    13/23

    Standard Error of Estimate Se Indicates the approximate size of the prediction

    errors

    About how far are the Yvalues from theirpredictions?

    For the magazine data

    Se = S = $21,578

    Actual Page Costs are about $21,578 from theirpredictions for this group of magazines (usingregression)

    Compare to SY

    = $45,446: Actual Page Costs are about

    $45,446 from their average (notusing regression) Usin the re ression e uation to redict Pa e Costs

    2/10/2012

    Y

  • 8/3/2019 Statistics- Multiple Regression

    14/23

    Coefficient of Determination R2

    Indicates the percentage of the variation in Ythat isexplained by (or attributed to) all of the Xvariables

    How well do the Xvariables explain Y? For the magazine data

    R2 = 0.787 = 78.7%

    The Xvariables (Audience,Percent Male, and MedianIncome) taken together explain 78.7% of the variance ofPage Costs

    This leaves 100% 78.7% = 21.3% of the variation inPage Costs unexplained

    2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    15/23

    Linear Model for the Population

    Y= (E + F1X1 + F2X2 + +FkXk) + I

    = (Population relationship) + Randomness

    Where I has a normal distribution with mean 0 andconstant standard deviation W, and this randomness

    is independent from one case to another An assumption needed for statistical inference

    2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    16/23

    2/10/2012

    Table 12.1.7

    Intercept or constant

    Regression coefficients

    Uncertainty in Y

    E

    F1F2.

    .

    .

    Fk

    W

    a

    b1

    b2

    .

    .

    .

    bk

    SorSe

    Population

    (parameters:

    fixed and

    unknown)

    Sample

    (estimators:

    random and

    known)

  • 8/3/2019 Statistics- Multiple Regression

    17/23

    Is the regression significant? Do the Xvariables, taken together, explain a

    significant amount of the variation in Y?

    The null hypothesis claims that, in the population,the Xvariables do nothelp explain Y; all coefficientsare 0

    H0: F1 =F2 = = Fk = 0

    The research hypothesis claims that, in thepopulation,at least one of the Xvariables does helpexplain Y

    H1: At least one of F1,F2, ,Fk { 02/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    18/23

    Three equivalent methods for performing Ftest; they always give the same result Use the p-value

    Ifp < 0.05, then the test is significant Same interpretation as p-values in Chapter 10

    Use the R2 value

    If R2 is larger than the value in the R2 table, then the

    result is significant Do the Xvariables explain more than just randomness?

    Use the Fstatistic

    If the Fstatistic is larger than the value in the Ftable,

    then the result is significant2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    19/23

    For the magazine data,The Xvariables (Audience,Percent Male, and Median Income) explain a very highlysignificant percentage of the variation in Page Costs

    Thep-value, listed as 0.000, is less than 0.0005, and istherefore very highly significant(since it is less than0.001)

    The R2 value, 78.7%, isgreaterthan 27.1% (from theR2 table at level 0.1% with n = 55 and k = 3), and istherefore very highly significant

    The Fstatistic,62.84, isgreaterthan the value(between 7.054 and 6.171) from the Ftable at level0.1%, and is therefore very highly significant

    2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    20/23

    At test for each regression coefficient To be used only if the Ftest is significant

    If Fis notsignificant, you should notlook at the t tests

    Does thejth Xvariable have a significant effect on Y,holding the other Xvariables constant?

    Hypotheses are

    H0:Fj = 0, H1:Fj { 0

    Test using the confidence interval use the t table with n k 1 degrees of freedom

    Or use the t statistic

    compare to the t table value with n k 1 degrees of

    freedom2/10/2012

    jbjstatisticSbt /!

    jbjtSb s

  • 8/3/2019 Statistics- Multiple Regression

    21/23

    Testing b1, the coefficient for Audienceb1 = 3.79,t = 13.5,p = 0.000

    Audience has a very highly significanteffect on Page

    Costs, after adjusting for Percent Male and MedianIncome

    Testing b2, the coefficient for Percent Maleb2 = 124,t = 0.90,p = 0.374

    Percent Male does nothave a significant effect on PageCosts, after adjusting for Audience and Median Income

    Testing b3, the coefficient for Median Incomeb3 = 0.903,t = 2.44,p = 0.018

    Median Income has a significanteffect on Page Costs,2/10/2012

  • 8/3/2019 Statistics- Multiple Regression

    22/23

    Standardized Regression Coefficients Indicate relative importance of the information eachXvariable brings inaddition to the others

    Ordinary regression coefficients are in different units And cannot be compared without standardization

    Defined as for thejth Xvariable

    Compare the absolute values

    Correlation Coefficients Indicate relative importance of the information eachXvariable brings withoutadjusting forthe other Xvariables

    2/10/2012

    YXjSSb

    j/

  • 8/3/2019 Statistics- Multiple Regression

    23/23

    Multicollinearity When some Xvariables are too similar to one

    another

    Might do a good job of explaining and predicting Y But t tests might not significant because no Xvariable is

    bringing new information

    Variable Selection

    How to choose from a long list of Xvariables? Too many: waste the information in the data

    Too few: risk ignoring useful predictive information

    Model Misspecification

    Perhaps the multiple regression linear model is2/10/2012