bms2024-multiple linear regression-1 lesson (1)

Upload: fayzmtjk

Post on 04-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    1/37

    ADVANCED MANAGERIAL

    STATISTICS

    Multiple Linear Regression

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    2/37

    Objectives

    apply multiple regression analysis to businessdecision-making situations

    analyze and interpret the computer output for a

    multiple regression model test the significance of the multiple regression

    model

    test the significance of the independentvariables in a multiple regression model

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    3/37

    Recap: Simple Linear Regression

    What is regression analysis?

    What does it mean by linear relationship?

    What are dependent and independent(predictor) variables?

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    4/37

    The Multiple Regression Model

    Idea: Examine the linear relationship between1 dependent (Y) & 2 or more independent variables (Xi)

    XXXYkik2i21i10i

    Multiple Regression Model with kIndependent Variables:

    Y-intercept Population slopes Random Error

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    5/37

    Multiple Regression Equation

    The coefficients of the multiple regression model areestimated using sample data

    kik2i21i10i XbXbXbbY

    Estimated(or predicted)value of Y

    Estimated slope coefficients

    Multiple regression equation with kindependent variables:

    Estimatedintercept

    In this chapter, we will always use Excel to obtain theregression slope coefficients and other regression

    summary measures.

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    6/37

    Example:2 Independent Variables

    A distributor of frozen desert pies wants to evaluatefactors thought that influence demand

    Dependent variable: Pie sales (units per week) Independent variables: Price (in $)

    Advertising ($100s)

    Data are collected for 15 weeks

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    7/37

    Pie Sales Example

    Sales = b0+ b1(Price)

    + b2(Advertising)

    Week

    Pie

    Sales

    Price

    ($)

    Advertising

    ($100s)

    1 350 5.50 3.3

    2 460 7.50 3.3

    3 350 8.00 3.0

    4 430 8.00 4.5

    5 350 6.80 3.0

    6 380 7.50 4.0

    7 430 4.50 3.0

    8 470 6.40 3.7

    9 450 7.00 3.5

    10 490 5.00 4.0

    11 340 7.20 3.5

    12 300 7.90 3.2

    13 440 5.90 4.0

    14 450 5.00 3.515 300 7.00 2.7

    Multiple regression equation:

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    8/37

    Estimating a Multiple LinearRegression Equation

    Excel will be used to generate thecoefficients and measures of goodness of

    fit for multiple regression Excel:

    Data / Data Analysis... / Regression

    Instructions are attached here.

    http://localhost/var/www/apps/conversion/tmp/scratch_8/Excel%20Tips%20on%20Regression%20Analysis-2013.docxhttp://localhost/var/www/apps/conversion/tmp/scratch_8/Excel%20Tips%20on%20Regression%20Analysis-2013.docx
  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    9/37

    Multiple Linear Regression: ExcelSummary Output

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    10/37

    Multiple Regression Excel Output

    Regression Statist ics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted RSquare 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA d f SS MS F Signi f icance FRegression 2 29460.027 14730.013 6.53861 0.01201

    Residual (Error) 12 27033.306 2252.776

    Total 14 56493.333

    Coeff ic ien

    tsStandard

    Error t Stat P-value Low er 95% Upper

    95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    ertising)74.131(Advce)24.975(Pri-306.526Sales

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    11/37

    The Multiple Regression Equation

    ertising)74.131(Advce)24.975(Pri-306.526Sales

    b1 = -24.975: saleswill decrease, onaverage, by 24.975

    pies per week foreach $1 increase inselling price, holdingadvertising constant

    b2= 74.131:sales willincrease, on average,by 74.131 pies per

    week for each $100increase inadvertising, holdingprice constant

    whereSales is in number of pies per week

    Price is in $Advertising is in $100s.

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    12/37

    Using The Equation to MakePredictions

    Predict sales for a week in which the sellingprice is $5.50 and advertising is $350:

    Predicted salesis 428.62 pies

    428.62

    (3.5)74.131(5.50)24.975-306.526

    ertising)74.131(Advce)24.975(Pri-306.526Sales

    Note that Advertising isin $100s, so $350means that X2= 3.5

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    13/37

    Measures of Variation

    Total variation is made up of two parts:

    SSESSRSST Total Sum ofSquares

    Regression Sumof Squares

    Error Sum ofSquares

    2

    i )YY(SST 2

    ii )Y

    Y(SSE 2

    i )YY

    (SSRwhere:

    = Average value of the dependent variable

    Yi= Observed values of the dependent variable

    i = Predicted value of Y for the given XivalueY

    Y

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    14/37

    SST = total sum of squares

    Measures the variation of the Yivalues around theirmean Y

    SSR = regression sum of squares

    Explained variation attributable to the relationshipbetween X and Y

    SSE = error sum of squares

    Variation attributable to factors other than therelationship between X and Y

    Measures of Variation

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    15/37

    Xi

    Y

    X

    Yi

    SSTSSE

    SSR _

    Y

    Y

    Y

    _Y

    Measures of Variation

    Regression Line

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    16/37

    The coefficient of determinationis the portionof the total variation in the dependentvariable that is explained by variation in the

    independent variable The coefficient of determination is also called

    R-squaredand is denoted as R2

    Coefficient of Determination, R2

    1R0 2 note:

    squaresofsumsquaresofregression2

    totalsum

    SSTSSRR

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    17/37

    Composition of Total Variation

    TotalVariation

    ExplainedVariation

    UnexplainedVariation

    SST

    SSRegression

    (SSR)

    SSResidual / SSError

    (SSE)

    =

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    18/37

    R2= 1

    How Strong is The Model?

    Y

    X

    Y

    X

    R2= 1

    R2= 1

    Perfect linear relationshipbetween X and Y:

    100% of the variation in Y is

    explained by variation in X

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    19/37

    Y

    X

    Y

    X

    0 < R2< 1

    Weaker linear relationshipsbetween X and Y:

    Some but not all of the

    variation in Y is explainedby variation in X

    How Strong is The Model?

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    20/37

    R2= 0

    No linear relationship betweenX and Y:

    The value of Y does not

    depend on X. (None of thevariation in Y is explained byvariation in X)

    Y

    XR2= 0

    How Strong is The Model?

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    21/37

    Regression Statist ics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted RSquare 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS FSigni f icance

    F

    Regression 2 29460.03 14730.02 6.539 0.01201

    Residual (Error) 12 27033.31 2252.78

    Total 14 56493.34

    Coeff ic ien

    tsStandard

    Error t Stat P-value Low er 95% Upper

    95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    .5214856493.3

    29460.0

    SST

    SSR

    R

    2

    52.1% of the variation in pie sales isexplained by the variation in priceand advertising. 47.9% is theunexplained variation.

    Coefficient of Determination

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    22/37

    Adjusted Coefficient of Determination (R2adj)

    R2 never decreases when a new Xvariable isadded to the model

    This can be a disadvantage when comparingmodels

    What is the net effect of adding a new variable?

    We lose a degree of freedom when a new Xvariable is added

    Did the new X variable add enoughexplanatory power to offset the loss of onedegree of freedom?

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    23/37

    Shows the proportion of variation in Y explainedby allXvariables adjusted for the number of Xvariables used

    (where n= sample size,p= number of independentvariables)

    Penalize excessive use of unimportant independentvariables

    Smaller than R2

    Useful in comparing among models

    Adjusted R2

    1

    1)1(1 22

    pn

    nRRadj

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    24/37

    Using the pie sales example:

    (where n= sample size,p= number of independent variables)

    Adjusted R2 (computation)

    4417.01215

    115)5215.01(1

    1

    1)1(1 22

    pn

    nRRadj

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    25/37

    Regression Statist ics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted RSquare 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS FSigni f icance

    F

    Regression 2 29460.027 14730.013 6.53861 0.01201

    Residual (Error) 12 27033.306 2252.776

    Total 14 56493.333

    Coeff ic ien

    tsStandard

    Error t Stat P-value Low er 95% Upper

    95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    .44172R2adj

    44.2% of the variation in pie sales isexplained by the variation in price andadvertising, taking into account the samplesize and number of independent variables

    Adjusted R2

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    26/37

    Is the Model Significant?

    F-Test for Overall Significance of the Model

    Shows if there is a linear relationship between all of the

    X variables considered together and Y

    Use F-test statistic

    Hypotheses:

    H0: 1= 2= = k= 0 (no linear relationship)

    H1: At least one i 0 (at least one independentvariable affects Y)

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    27/37

    F-Test for Overall Significance

    Test statistic:

    =

    = ()

    ()

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    28/37

    F-distribution

    Like t-distribution, the shape of F-distributioncurve depends on the number of degrees offreedom (df).

    It has two degrees of freedom (i.e. dfnumerator &

    dfdenominator).

    It is right skewed but skewness decreases as the dfincreases.

    Characteristics:

    The F-distribution is continuous and skewed to the right

    The units of an F-distribution are nonnegative.

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    29/37

    Critical Value: F-distribution

    = 0.05

    F,d f1,d f2

    Rejection Region

    NonRejectionRegion

    Degree of freedom (df1) = p

    Degree of freedom (df2) = (n p 1) where p = number of predictors

    F0

    Decision Rule:

    If F test statistic > F,df

    or

    p-value < = 0.05

    So, reject H0

    otherwise

    Do Not Reject H0

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    30/37

    6.53862252.8

    14730.0

    MSE

    MSRF

    Regression Statist ics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted RSquare 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS FSigni f ican

    ce F

    Regression 2 29460.027 14730.013 6.53861 1.20E-02

    Error 12 27033.306 2252.776

    Total 14 56493.333

    Coeff ic ient

    sStandard

    Error t Stat P-value Low er 95% Upper

    95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    F-Test for Overall Significance

    With 2 and 12 degreesof freedom

    p-value forthe F-Test= 0.012

  • 8/13/2019 BMS2024-Multiple Linear Regression-1 Lesson (1)

    31/37

    H0: 1= 2= 0

    H1: 1and 2not both zero

    = .05

    df1= 2 df2= 12

    Test Statistic:

    Decision:

    Conclusion:

    Since F test statistic is in therejection region (p-value