multiple reg ludlow

54
Multiple Regression Fitting Models for Multiple Independent Variables By Ellen Ludlow

Upload: chompoonoot

Post on 03-Oct-2015

24 views

Category:

Documents


0 download

DESCRIPTION

z

TRANSCRIPT

  • Multiple Regression Fitting Models for Multiple Independent VariablesBy Ellen Ludlow

  • If you wanted to predict someones weight based on their height, you would collect data by recording the height and weight and fit a model.

  • If you wanted to predict someones weight based on their height, you would collect data by recording the height and weight and fit a model.Lets say our population are males ages 16-25, and this is a table of collected data...

  • If you wanted to predict someones weight based on their height, you would collect data by recording the height and weight and fit a model.Lets say our population are males ages 16-25, and this is a table of collected data...

    Sheet4

    Sheet1

    1606825

    1587019

    1556718

    1546518

    1486417

    1466623

    1446618

    1446318

    1406420

    1376217

    1356616

    1336419

    1306616

    1286418

    height606365666768686970707172727375

    weight1203513014337149144150156152154162169163168

    age25191817231818201716191618

    Sheet2

    Sheet3

  • Next, we graph the data..

  • Next, we graph the data..

    Sheet4

    Chart1

    Height (ins)

    Weight (lbs)

    Weight vs. Height

    Chart2

    60

    63

    65

    66

    67

    68

    68

    69

    70

    70

    71

    72

    72

    73

    75

    Height (ins)

    Weight (lbs)

    Height vs Weight

    Sheet1

    60120

    63135

    65130

    66143

    67137

    68149

    68144

    69150

    70156

    70152

    71154

    72162

    72169

    73163

    75168

    120135130143137149144150156152154162169163168

    height606365666768686970707172727375

    weight120135130143137149144150156152154162169163168

    age25191817231818201716191618

    Sheet2

    Sheet3

  • Next, we graph the data..And because the data looks linear, fit an LSR line

    Sheet4

    Chart1

    Height (ins)

    Weight (lbs)

    Weight vs. Height

    Chart2

    60

    63

    65

    66

    67

    68

    68

    69

    70

    70

    71

    72

    72

    73

    75

    Height (ins)

    Weight (lbs)

    Height vs Weight

    Sheet1

    60120

    63135

    65130

    66143

    67137

    68149

    68144

    69150

    70156

    70152

    71154

    72162

    72169

    73163

    75168

    120135130143137149144150156152154162169163168

    height606365666768686970707172727375

    weight120135130143137149144150156152154162169163168

    age25191817231818201716191618

    Sheet2

    Sheet3

  • Next, we graph the data..And because the data looks linear, fit an LSR line

    Sheet4

    Chart1

    Height (ins)

    Weight (lbs)

    Weight vs. Height

    Chart2

    60

    63

    65

    66

    67

    68

    68

    69

    70

    70

    71

    72

    72

    73

    75

    Height (ins)

    Weight (lbs)

    Height vs Weight

    Sheet1

    60120

    63135

    65130

    66143

    67137

    68149

    68144

    69150

    70156

    70152

    71154

    72162

    72169

    73163

    75168

    120135130143137149144150156152154162169163168

    height606365666768686970707172727375

    weight120135130143137149144150156152154162169163168

    age25191817231818201716191618

    Sheet2

    Sheet3

  • But weight isnt the only factor that has an impact on someones height. The height of someones parents may be another predictor. With multiple regression you may have more then one independent variable, so you could use someone's weight and his parents height to predict his own height.

  • Our new table, with the data, the average height of a subjects parents, looks like this

    Sheet4

    Sheet1

    1606825

    1587019

    1556718

    1546518

    1486417

    1466623

    1446618

    1446318

    1406420

    1376217

    1356616

    1336419

    1306616

    1286418

    height606365666768686970707172727375

    weight1203513014337149144150156152154162169163168

    parent's height596762597166716769736975726973

    Sheet2

    Sheet3

  • This data cant be graphed like simple linear regression, because there are two independent variables.

  • This data cant be graphed like simple linear regression, because there are two independent variables.There is software, however, such as Minitab, that can analyze data with multiple independent variable.Lets take a look at a Minitab output for our data

  • What does all this mean?Predictor Coef Stdev t-ratio pConstant 25.028 4.326 5.79 0.000weight 0.24020 0.03140 7.65 0.000parenth 0.11493 0.09035 1.27 0.227s = 1.165 R-sq = 92.6% R-sq(adj) = 91.4%Analysis of VarianceSOURCE DF SS MS F pRegression 2 205.31 102.65 75.62 0.000Error 12 16.29 1.36Total 14 221.60

  • First, Lets look at the multiple regression modelSimple linear regression model:Multiple regression model:The general model for multiple regression is similar to the model for simple linear regression.

  • Just like linear regression, when you fit a multiple regression to data, the terms in the model equation are statistics not parameters. where k is the number of independent variables.A multiple regression model using statistical notation looks like...

  • The multiple regression model for our data isPredictor Coef Stdev t-ratio pConstant 25.028 4.326 5.79 0.000weight 0.24020 0.03140 7.65 0.000parenth 0.11493 0.09035 1.27 0.227 We get the coefficient values from the Minitab output

  • Once the regression is fitted, we need to know how well the model fits the dataFirst, we check and see if there is a good overall fit.

    Then, we test the significance of each independent variable. You will notice that this is the same way we test for significance in a simple linear regression.

  • The Overall Test

    Hypotheses:

  • The Overall Test

    All independent variables are unimportant for predicting yHypotheses:

  • The Overall Test

    All independent variables are unimportant for predicting yAt least one independent variable is useful for predicting yAt least oneHypotheses:

  • How do you calculate the F-statistic?

  • How do you calculate the F-statistic?It can easily be found in the Minitab output, along with the p-value

  • How do you calculate the F-statistic?It can easily be found in the Minitab output, along with the p-valueSOURCE DF SS MS F pRegress 2 205.31 102.65 75.62 0.000Error 12 16.29 1.36Total 14 221.60 Or you can calculate it by hand

  • But, before you can calculate the F-statistic, you need to be introduced to some other terms.

  • But, before you can calculate the F-statistic, you need to be introduced to some other terms.Regression sum of squares (regression SS) - the variation in Y accounted for by the regression model with respect to the mean model

  • But, before you can calculate the F-statistic, you need to be introduced to some other terms.Regression sum of squares (regression SS) - the variation in Y accounted for by the regression model with respect to the mean modelError sum of squares (error SS) - the variation in Y not accounted for by the regression model.

  • But, before you can calculate the F-statistic, you need to be introduced to some other terms.Regression sum of squares (regression SS) - the variation in Y accounted for by the regression model with respect to the mean modelError sum of squares (error SS) - the variation in Y not accounted for by the regression model.Total sum of squares (total SS) - the total variation in Y

  • Now that we understand these terms we need to know how to calculate them

  • Regression SSError SSTotal SSNow that we understand these terms we need to know how to calculate themTotal SS = Regression SS + Error SS

  • There are also regression mean of squares, error mean of squares, and total mean of squares (abbreviated MS).

  • There are also regression mean of squares, error mean of squares, and total mean of squares (abbreviated MS).To calculate these terms, you divide the sum of squares by its respective degrees of freedom

  • There are also regression mean of squares, error mean of squares, and total mean of squares (abbreviated MS).To calculate these terms, you divide the sum of squares by its respective degrees of freedomRegression d.f. = kError d.f. = n-k-1Total d.f. = n-1

  • There are also regression mean of squares, error mean of squares, and total mean of squares (abbreviated MS).To calculate these terms, you divide the sum of squares by its respective degrees of freedomRegression d.f. = kError d.f. = n-k-1Total d.f. = n-1Where k is the number of independent variables and n is the total number of observations used to calculate the regression

  • SoRegression MS

    Error MS

    Total MSand Regression MS + Error MS = Total MS

  • Both sum of squares and mean squares values can be found in Minitab

  • Both sum of squares and mean squares values can be found in MinitabSOURCE DF SS MS F pRegress 2 205.31 102.65 75.62 0.000Error 12 16.29 1.36Total 14 221.60

  • Both sum of squares and mean squares values can be found in MinitabNow we can calculate the F-statistic.SOURCE DF SS MS F pRegress 2 205.31 102.65 75.62 0.000Error 12 16.29 1.36Total 14 221.60

  • Test Statistic and DistributionTest statistic:

    F= model mean square error mean squareF= 102.65 1.36F= 75.48

    Which is very close to F-statistic from Minitab ( 75.62)

  • The p-value for the F-statistic is then found in a F-Distribution Table. As you saw before, it can also be easily calculated by software.

    A small p-value rejects the null hypothesis that none of the independent variables are significant. That is to say, at least one of the independent variables are significant.

  • The conclusion in the context of our data is:We have strong evidence (p is approx. 0) to reject the null hypothesis. That is to say either someones weight or their average parents height is significant in predicting his height.Once you know that at least one independent variable is significant, you can go on to test each independent variable separately.

  • Testing Individual TermsIf an independent variable does not contribute significantly to predicting the value of Y, the coefficient of that variable will be 0. The test of the these hypotheses determines whether the estimated coefficient is significantly different from 0. From this, we can tell whether an independent variable is important for predicting the dependent variable.

  • Test for Individual Terms:

  • Test for Individual Terms:

    HO:

  • Test for Individual Terms:

    HO: The independent variable, xj, is not important for predicting y

  • Test for Individual Terms:

    HO: The independent variable, xj, is not important for predicting y

    HA:

  • Test for Individual Terms:

    HO: The independent variable, xj, is not important for predicting y

    HA:The independent variable, xj, is important for predicting y

  • Test for Individual Terms:

    HO: The independent variable, xj, is not important for predicting y

    HA:The independent variable, xj, is important for predicting y

    where j represents a specified random variable

  • Test Statistic:

    t=

  • Test Statistic:

    t=

    d.f. = n-k-1

  • Test Statistic:

    t=

    d.f. = n-k-1

    Remember, this test is only to be performed, if the overall model of the test is significant.

  • T-distributionTests of individual terms for significance are the same as a test of significance in simple linear regression

  • A small p-value means that the independent variable is significant.Predictor Coef Stdev t-ratio pConstant 25.028 4.326 5.79 0.000weight 0.24020 0.03140 7.65 0.000parenth 0.11493 0.09035 1.27 0.227 This test of significance shows that weight is a significant independent variable for predicting height, but average parent height is not.

  • Now that you know how to do tests of significance for multiple regression, there are many other things that you can learn. Such asHow to create confidence intervalsHow to use categorical variables in multiple regressionHow to test for significance in groups of independent variables