doane chapter 12a

Upload: alisson-tavares

Post on 14-Apr-2018

237 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Doane Chapter 12a

    1/36

  • 7/29/2019 Doane Chapter 12a

    2/36

    Bivariate Regression (Part 1)

    C h

    a p t er

    12

    Visual Displays and Correlation Analysis Bivariate Regression

    Regression Terminology Ordinary Least Squares Formulas

    Tests for Significance

  • 7/29/2019 Doane Chapter 12a

    3/36

    Visual Displays andCorrelation Analysis

    Begin the analysis of bivariate data (i.e., twovariables) with a scatter plot .

    A scatter plot- displays each observed data pair ( x i , y i ) as a dot

    on an X/Y grid- indicates visually the strength of the relationship

    between the two variables

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Visu al Disp lay s

  • 7/29/2019 Doane Chapter 12a

    4/36

    Visual Displays andCorrelation Analysis

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Visu al Disp lay s

  • 7/29/2019 Doane Chapter 12a

    5/36

    Visual Displays andCorrelation Analysis

    The sample correlation coefficient (r ) measuresthe degree of linearity in the relationship between

    X and Y . -1 < r < +1

    r = 0 indicates no linear relationship In Excel, use =CORREL(array1,array2),where array1 is the range for X and array2 is therange for Y .

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Correlat ion A nalys is

    Strong negative relationship Strong positive relationship

  • 7/29/2019 Doane Chapter 12a

    6/36

    Visual Displays andCorrelation Analysis

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Correlat ion A nalys is

  • 7/29/2019 Doane Chapter 12a

    7/36

    Visual Displays andCorrelation Analysis

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Correlat ion A nalys is

    Strong PositiveCorrelation

    Weak PositiveCorrelation

  • 7/29/2019 Doane Chapter 12a

    8/36

    Visual Displays andCorrelation Analysis

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Correlat ion A nalys is

    Weak NegativeCorrelation

    Strong NegativeCorrelation

  • 7/29/2019 Doane Chapter 12a

    9/36

    Visual Displays andCorrelation Analysis

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Correlat ion A nalys is

    No Correlation

    Nonlinear Relation

  • 7/29/2019 Doane Chapter 12a

    10/36

    Visual Displays andCorrelation Analysis

    r is an estimate of the population correlationcoefficient r (rho).

    To test the hypothesis H 0: r = 0, the test statisticis:

    The critical value t a is obtained from Appendix Dusing n = n 2 degrees of freedom for any a .

    Find the p -value using Excels function=TDIST(t,deg_freedom,tails) or MINITAB.

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Tests for Sign if icance

  • 7/29/2019 Doane Chapter 12a

    11/36

    Visual Displays andCorrelation Analysis

    Equivalently, you can calculate the critical valuefor the correlation coefficient using

    This method gives a benchmark for the

    correlation coefficient. However, there is no p -value and is inflexible if you change your mind about a .

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Tests for Sign if icance

  • 7/29/2019 Doane Chapter 12a

    12/36

    Visual Displays andCorrelation Analysis

    Step 1: State the HypothesesDetermine whether you are using a one or two-tailed test and the level of significance ( a ).

    H 0: r = 0H 1: r 0

    Step 2: Calculate the Critical Value

    For degrees of freedom n = n -2, look up thecritical value t a in Appendix D, then calculate

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Steps in Test in g i f r = 0

  • 7/29/2019 Doane Chapter 12a

    13/36

    Visual Displays andCorrelation Analysis

    Step 3: Make the DecisionIf the sample correlation coefficient r exceeds the

    critical value r a , then reject H 0.If using the t statistic method, reject H 0 if t > t a or if the p -value < a .

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Steps in Test in g i f r = 0

  • 7/29/2019 Doane Chapter 12a

    14/36

    Visual Displays andCorrelation Analysis

    A quick test for significance of a correlation ata = .05 is | r | > 2/ n

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Quick Rule for S igni f icance

  • 7/29/2019 Doane Chapter 12a

    15/36

    Visual Displays andCorrelation Analysis

    As sample size increases, the critical value of r becomes smaller.

    This makes it easier for smaller values of thesample correlation coefficient to be consideredsignificant.

    A larger sample does not mean that thecorrelation is stronger nor does its significanceimply importance.

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Role of Sam p le Size

  • 7/29/2019 Doane Chapter 12a

    16/36

    Bivariate Regression

    Bivariate Regression analyzes the relationshipbetween two variables.

    It specifies one dependent (response ) variableand one independent ( predictor ) variable.

    This hypothesized relationship may be linear,quadratic, or whatever.

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Wh at is B ivar iate Regress ion ?

  • 7/29/2019 Doane Chapter 12a

    17/36

    Bivariate Regression

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Mod el Form

  • 7/29/2019 Doane Chapter 12a

    18/36

    Regression Terminology

    Unknown parameters areb0 Intercept

    b1 Slope The assumed model for a linear relationship is

    y i = b0 + b1 x i + ei for all observations ( i = 1, 2, , n)

    The error term is not observable, is assumednormally distributed with mean of 0 and standarddeviation s .

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Mod els and Parameters

  • 7/29/2019 Doane Chapter 12a

    19/36

    Regression Terminology

    The fitted model used to predict the expected value of Y for a given value of X is

    y i = b 0 + b 1 x i

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Mod els and Parameters

    The fitted coefficients areb 0 the estimated intercept

    b 1 the estimated slope Residual is e i = y i - y i . Residuals may be used to estimate s , the

    standard deviation of the errors.

    ^

    ^

  • 7/29/2019 Doane Chapter 12a

    20/36

    Regression Terminology

    Step 1:- Highlight the data columns.

    - Click on the Chart Wizard and choose Scatter Plot

    - In the completed graph, click once on the pointsin the scatter plot to select the data

    - Right-click and choose Add Trendline- Choose Options and check Display Equation

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Fi t t ing a Regress io n o n a Scat ter Plot in Excel

  • 7/29/2019 Doane Chapter 12a

    21/36

    Regression Terminology

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Fi t t ing a Regress io n o n a Scat ter Plot in Excel

  • 7/29/2019 Doane Chapter 12a

    22/36

    Regression Terminology

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

  • 7/29/2019 Doane Chapter 12a

    23/36

    Ordinary Least Squares Formulas

    The ordinary least squares method ( OLS )estimates the slope and intercept of theregression line so that the residuals are small.

    The sum of the residuals = 0

    The sum of the squared residuals is SSE

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Slope and Intercept

  • 7/29/2019 Doane Chapter 12a

    24/36

    Ordinary Least Squares Formulas

    The OLS estimator for the slope is:

    The OLS estimator for the intercept is:

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Slope and Intercept

    or

  • 7/29/2019 Doane Chapter 12a

    25/36

    Ordinary Least Squares Formulas

    We want to explain the total variation in Y aroundits mean ( SST for Total Sums of Squares )

    The regression sum of squares ( SSR ) is theexplained variation in Y

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies Inc All rights reserved

    A ssess ing F it

  • 7/29/2019 Doane Chapter 12a

    26/36

    Ordinary Least Squares Formulas

    The error sum of squares ( SSE ) is theunexplained variation in Y

    If the fit is good, SSE will be relatively smallcompared to SST .

    A perfect fit is indicated by an SSE = 0. The magnitude of SSE depends on n and on the

    units of measurement.McGraw-Hill/Irwin 2007 The McGraw-Hill Companies Inc All rights reserved

    A ssess ing F it

  • 7/29/2019 Doane Chapter 12a

    27/36

    Ordinary Least Squares Formulas

    R 2 is a measure of relative fit based on acomparison of SSR and SST .

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies Inc All rights reserved

    Coeff ic ient o f Determ inat ion

    0 < R 2 < 1

    Often expressed as a percent, an R 2 = 1 (i.e.,100%) indicates perfect fit.

    In a bivariate regression, R 2 = ( r )2

    R 2 is a measure of relative fit based on acomparison of SSR and SST .

  • 7/29/2019 Doane Chapter 12a

    28/36

    Tests for Significance

    The standard error (s yx ) is an overall measure of model fit.

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies Inc All rights reserved

    Stand ard Error o f Regress ion

    If the fitted models predictions are perfect(SSE = 0), then s yx = 0. Thus, a small s yx

    indicates a better fit. Used to construct confidence intervals. Magnitude of s yx depends on the units of

    measurement of Y and on data magnitude.

  • 7/29/2019 Doane Chapter 12a

    29/36

    Tests for Significance

    Standard error of the slope:

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies Inc All rights reserved

    Con fidenc e Intervals fo r Slope and Intercept

    Standard error of the intercept:

  • 7/29/2019 Doane Chapter 12a

    30/36

    Tests for Significance

    Confidence interval for the true slope:

    McGraw Hill/Irwin

    2007 The McGraw Hill Companies Inc All rights reserved

    Con fidenc e Intervals fo r Slope and Intercept

    Confidence interval for the true intercept:

  • 7/29/2019 Doane Chapter 12a

    31/36

    Tests for Significance

    If b1 = 0, then X cannot influence Y and theregression model collapses to a constant b0 plusrandom error.

    The hypotheses to be tested are:

    McGraw Hill/Irwin

    2007 The McGraw Hill Companies Inc All rights reserved

    Hyp oth esis Tests

  • 7/29/2019 Doane Chapter 12a

    32/36

    Tests for Significance

    A t test is used with n = n 2 degrees of freedomThe test statistics for the slope and intercept are:

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Hyp oth esis Tests

    t n-2 is obtained from Appendix D or Excel for agiven a .

    Reject H 0 if t > t a or if p -value < a .

    Slope:

    Intercept:

  • 7/29/2019 Doane Chapter 12a

    33/36

    Tests for Significance

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Using Exc el

  • 7/29/2019 Doane Chapter 12a

    34/36

    Tests for Significance

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Usin g MegaStat

  • 7/29/2019 Doane Chapter 12a

    35/36

    Tests for Significance

    McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.

    Us in g MINITA B

  • 7/29/2019 Doane Chapter 12a

    36/36

    Applied Statistics inBusiness and Economics

    End of Chapter 12 Part 1 of 2