16 analysis of variance (anova).pdf

Upload: fcojavierespinosa

Post on 02-Jun-2018

269 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    1/14

    Section 16.1 16-1

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    CHAPTER 16

    Analysis of Variance (ANOVA)

    GENERALOBJECTIVE

    In Chapter 10, we studied inferential methods for comparing the means of two

    populations. Now we will study analysis of variance, or ANOVA, which

    provides methods for comparing two or more population means. You shouldbe familiar with the chapter that discusses analysis of variance in your

    textbook before beginning this chapter.

    LESSONOUTLINE

    16.1 The F-distribution16.2 One-Way ANOVA: The Logic16.3 One-Way ANOVA: The Procedure

    16.4 Multiple Comparisons*16.5 The Kruskal-Wallis Test*16.6 Problems

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    2/14

    16-2 Analysis of Variance (ANOVA)

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    16.1 The F-distribution

    Analysis of variance procedures rely on a distribution called the

    F-distribution, named in honor of Sir Ronald Fisher (1800-1962). A variable

    is said to have an F-distributionif its distribution has a special type of right-skewed curve, called an F-curve. There are infinitely manyF-distributions,which we identify by stating two associated degrees of freedom a degrees of

    freedom for the numerator and a degrees of freedom for the denominator. We

    will now study how SPSS can be used to findF-value,F, from thisdistribution.

    Finding theF-Value Having a Specified Area to Its Right

    Example 16.1 For anF-curve with degrees of freedom, df = (4, 12), findF0.05; that is, find

    theF-value having area 0.05to its right for anF-distribution with 4degrees offreedom in the numerator and 12degrees of freedom in the denominator.

    Solution The SPSS function, IDF.F(prob, df1, df2)returns the value from theF-distribution, with the specified degrees of freedom, df =(df1, df2), forwhich the area to the left isprob. Similar to computing a t-score, we will use

    the Compute Variabledialog box.

    TheF-value having area 0.05to its right has area 0.95to its left, since the

    total area under the probability curve is one. In the Numeric Expressionbox

    typeIDF.F(0.95, 4, 12). SPSS returns theF-value that has area 0.05to its

    right as F = 3.26.

    16.2 One-Way ANOVA: The Logic

    Analysis of Variance (ANOVA) provides methods for comparing severalpopulation means, that is, the means of a single variable from several

    populations. In this Chapter, we study one-way analysis of variance. This

    type of ANOVA is called one-wayanalysis of variance because it comparesthe means of a variable for populations that result from a classification by one

    variable, called the factor. The possible values of the factor are referred to as

    the levelsof the factor.

    One-way ANOVA is the generalization to more than two populations of the

    pooled t-procedure. As in the pooled t-procedure, we make the followingassumptions.

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    3/14

    Section 16.3 16-3

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    Assumptions (Conditions) for One-Way ANOVA

    1. Simple Random Samples: The samples taken from the populations underconsideration are simple random samples.

    2. Independent Samples: The samples taken from the populations underconsideration are independent of one another.

    3. Normal populations: For each population, the variable underconsideration is normally distributed.

    4. Equal standard deviations: The standard deviations of the variableunder consideration are the same for all the populations.

    16.3 One-Way ANOVA: The Procedure

    The One-Way ANOVA Test

    Example 16.3 Energy Consumption: The U.S. Energy Information Administration gathersdata on residential energy consumption and expenditures and publishes its

    findings inResidential Energy Consumption Survey: Consumption andExpenditures. Table 16 - 1 shows last years energy consumptions for four

    independent random samples of households in the four U.S. regions

    Table 16 - 1

    Energyconsumptionfor samples

    ofhouseholdsin four U.S.

    regions

    Northeast Midwest South West15 17 11 1010 12 7 12

    13 18 9 8

    14 13 13 713 15 9

    12

    At the 5% level of significance, do the data provide sufficient evidence to

    conclude that a difference exists in mean annual energy consumption by

    households in the four U.S. regions?

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    4/14

    16-4 Analysis of Variance (ANOVA)

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    Solution Type the data into two variables named, ENERGYand REGION. ENERGYshould contain all 20data values in the four samples. REGIONshould takeon the four values, 1, 2, 3, and 4, which associate the case with a region. The

    values of REGION, 1, 2, 3, and 4, should be associated with the value labels,

    Northeast, Midwest, South, and West, respectively.

    Step 1:State the null and alternative hypotheses.

    Let 1, 2, 3, and 4denote last years mean energy consumptions for

    households in the Northeast, Midwest, South, and West, respectively. Thenull and alternative hypotheses are:

    0 1 2 3 4: (mean consumptions are all equal)H = = =

    : Not all the mean consumptions are all equala

    H

    Step 2: Decide on the significance level, .

    The test is to be performed at the 5% significance level. Thus

    = 0.05.

    Step 3: Compute the value of the test statistic.

    1. Test the hypotheses by choosing Analyze > Compare Means >One-Way ANOVAto open the One-Way ANOVAdialog box

    (Figure 16 - 1).

    Figure 16 - 1

    One-Way

    ANOVAdialog box

    2. Paste the variable ENERGYinto the Dependent Listbox and the

    variableREGION

    into the Factorbox.

    3. Click the OKbutton to display the results of the one-way ANOVA

    in Viewerwindow.

    The ANOVAtable (Figure 16 - 2) shows several statistics used in analysis of

    variance.

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    5/14

    Section 16.3 16-5

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    Figure 16 - 2

    ANOVA tablefrom One-

    Way ANOVA

    procedure

    The test statistic isF= 6.318. It has anF-distribution with df =(3, 16).

    Step 4: Obtain thep-value.

    The test statistic has an associatedp-value = 0.005 which is given under the

    column titled Sig.

    Step 5: If P < , rejectH0; otherwise, do not rejectH0.

    Thep-value is less than the specified significance level of 0.05; therefore, we

    reject the null hypothesis.

    Step 6:Interpret the results of the hypothesis test.

    At the 5% significance level, the data provide sufficient evidence to concludethat a difference exists in last years mean energy consumption by households

    among the four U.S. regions. That is, at least two of the regions have different

    mean energy consumptions.

    The ANOVA Table

    The layout of the ANOVA table in SPSS is similar to the layout in the chapter

    with the following exceptions. SPSS denotes Treatment by Between Groupsand Error by Within Groups. This is because SSTRcan be thought of as the

    error betweenthe sample means and SSEcan be thought of as the error within

    the samples. The values of SSTR = 97.5, SSE = 82.3, and SST = 179.8can

    be read from the second column in the ANOVA table (Figure 16 - 2).

    The one-way ANOVA identity,

    SST = SSTR + SSE =97.5 + 82.3= 179.8,

    shows that the total variation among all the sample data can be partitioned intoa component representing variation among the sample means and a

    component representing variation within samples. The associated degrees of

    freedom and mean squares are also reported in the ANOVA table.

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    6/14

    16-6 Analysis of Variance (ANOVA)

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    16.4 Multiple Comparisons*

    When the null hypothesis is rejected in a one-way ANOVA, the conclusion isthat the means are not all equal. Once you make that decision, you may also

    want to know which means are different, which is the largest, or, more

    generally, the relation among all the between the means. Methods for doingsuch problems are called multiple comparisons.

    SPSS provides several multiple comparison methods including the Tukey

    multiple comparison method. In multiple comparisons, it is important to

    distinguish between the individual confidence leveland thefamily confidence

    level. The individual confidence levelis the confidence that any particularconfidence interval contains the true difference between the corresponding

    population means; the family confidence levelis the confidence that allthe

    confidence intervals simultaneously contain their respective true differences.

    The Tukey multiple comparison method is based on the studentized rangedistribution. The Tukey multiple comparison method for obtaining

    confidence intervals for the differences between means is similar to the pooledt-interval formula. The essential difference is that, in the Tukey multiple

    comparison method the percentile of a studentized range distribution is used

    instead of the percentile of a t-distribution. The effect of this is that the(1-)-level confidence intervals constructed by the Tukey multiple

    comparisons method have a family confidence level of 1-. Each of the

    (1-)-level confidence intervals constructed by the pooled t-interval formula

    has an individual confidence level of 1-, the family confidence for this set ofconfidence intervals in smaller than 1-.

    The Tukey Multiple-Comparison

    Example 16.6 Energy Consumption: Apply the Tukey multiple comparison method to theenergy consumption data in Table 16 - 1. Use a family confidence level of

    95%.

    Solution To perform Tukey multiple comparisons in SPSS,

    1. Click the Post Hoc...button in the One-Way ANOVAdialog box

    (Figure 16 - 1) to open the One-Way ANOVA: Post HocMultiple Comparisonsdialog box (Figure 16 - 3).

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    7/14

    Section 16.4 16-7

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    Figure 16 - 3

    One-WayANOVA:Post Hoc

    MultipleComparisons

    dialog box

    2. Choose the checkbox for Tukey.

    3. A 95% family confidence interval corresponds to a 5% significance level.

    Therefore, enter 0.05into the Significance levelbox.

    4. Click the Continuebutton to close the dialog box and then click the OK

    button to display the results in the Viewerwindow.

    The Multiple Comparisonstable (Figure 16 - 4) shows 95% confidence

    intervals for the differences using the Tukey multiple comparisons method.

    Figure 16 - 4

    MultipleComparisons

    table forTukey

    multiplecomparisons

    method

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    8/14

    16-8 Analysis of Variance (ANOVA)

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    For example, the confidence interval for the mean difference between the

    Northeast and Midwest regions is5.429to 2.429. Two population means aresignificantly different if their confidence interval does notinclude 0. This is

    true for the Midwest and South regions, for example. SPSS provides another

    table, the Homogeneous Subsetstable (Figure 16 - 5), to help decipher which

    population means are different and which are equal.

    Figure 16 - 5

    Homogen-eous subsets

    table fromTukey

    multiplecomparisonprocedure

    Means that are lined up together in a column under Subset for alpha = 0.05

    are judged equal by the Tukey multiple comparison method. Means that arein separate columns are judged not equal. That is, there is sufficient evidence

    the population means for the regions, West, South, and Northeast are equal;

    and the population means for the regions, Northeast and Midwest are equal.

    Further, since West and Midwest are in different columns there is sufficientevidence that they are not equal. These results have a 95% family confidence

    level.

    16.5 The Kruskal-Wallis Test*

    The Kruskal-Wallis testis a nonparametric alternative to the one-way

    ANOVA procedure. The Kruskal-Wallis tests whether several independentsamples are from the same population. The Kruskal-Wallis test applies when

    the distributions (one for each population) of the variable under consideration

    have the same shape, but does not require that they be normal or have anyother specific shape. Like the Mann-Whitney test, the Kruskal-Wallis test is

    based on ranks.

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    9/14

    Section 16.5 16-9

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    The Kruskal-Wallis Test

    Example 16.8 Vehicle Miles: The U.S. Federal Highway Administration conducts annualsurveys on motor vehicle travel by type of vehicle and publishes its findings

    inHighway Statistics. Independent simple random samples of cars, buses, and

    trucks were chosen and the data on number of miles driven, in thousands, byeach sampled vehicle last year are shown in Table 16 - 2.

    Table 16 - 2

    Numbermiles driven(1000s) last

    year forindependent

    samples ofcars, buses,

    and trucks

    Cars Buses Trucks

    19.9 1.8 24.6

    15.3 7.2 37.0

    2.2 7.2 21.2

    6.8 6.5 23.6

    34.2 13.3 23.0

    8.3 25.4 15.3

    12.0 57.17.0 14.5

    9.5 26.0

    1.1

    Preliminary data analysis (not shown) suggest that the distributions of miles

    driven have roughly the same shape for cars, buses, and trucks but that thosedistributions are far from normal. Thus the appropriate test is the Kruskal-

    Wallis procedure. At the 5% significance level, do the data provide sufficient

    evidence to conclude that a difference exists in last years mean number ofmiles driven among cars, buses, and trucks?

    Solution The Kruskal-Wallis test is performed by theTests for Several IndependentSamplesdialog box. Type the data into two variables named, MILESand

    VEHICLE, in a new data file. MILESshould contain all 25data values in

    the three samples. VEHICLEshould take on the values, 1, 2,and3,associated with the value labels, Cars, Buses, and Trucks, respectively.

    Step 1:State the null and alternative hypothesesLet 1, 2, and 3denote last years mean number of miles driven for cars,

    buses, and trucks, respectively. The null and alternative hypotheses are:

    0 1 2 3: (mean miles driven are all equal)H = =

    : Not all the means all equala

    H

    Step 2: Decide on the significance level, .

    The test is to be performed at the 5% significance level. Thus = 0.05.

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    10/14

    16-10 Analysis of Variance (ANOVA)

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    Step 3: Compute the value of the test statistic

    1. Test the hypotheses by choosing Analyze > Nonparametric Tests >

    Legacy Dialogs > K Independent Samplesto open the Tests forSeveral Independent Samplesdialog box (Figure 16 - 6).

    2. Paste the variable MILESinto the Test Variable Listbox and the

    variable VEHICLEinto the Grouping Variablebox.

    Figure 16 - 6

    Tests forSeveral

    IndependentSamples

    dialog box

    Next, we need to specify the minimum and maximum integer values for thegrouping variable. The minimum value must be less than the maximum value.

    Cases associated with values outside the bounds are excluded during the

    analysis. This option is supplied so that the Kruskal-Wallisprocedure can beperformed on a subset of the samples.

    3. Click the Define Rangebutton to open the Several Independent

    Samples: Define Rangedialog box (Figure 16 7).

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    11/14

    Section 16.5 16-11

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    Figure 16 7Several

    IndependentSamples:

    Define

    Range dialogbox

    We require all the cases to be analyzed, consequently enter 1, the minimum

    value in VEHICLE, into the Minimumbox and 3, the maximum value in

    VEHICLE, into the Maximumbox.

    4. Click the Continuebutton to close the dialog box and update the grouping

    variable information in the Tests for Several Independent Samplesdialog box.

    5. Click the OKbutton to display the results in the Viewerwindow.

    The Rankstable (Figure 16 8) displays the mean ranks for each of the three

    samples. If the sample means are equal we would expect the mean ranks to be

    approximately equal.

    Figure 16 8

    Ranks tablefrom Kruskal-

    Wallisprocedure

    The Test Statisticstable (Figure 16 9) gives the chi-square test statistic,

    degrees of freedom associated with the test statistic, and thep-value of thehypothesis test.

    Figure 16 9

    TestStatistics

    table fromKruskal-

    Wallisprocedure

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    12/14

    16-12 Analysis of Variance (ANOVA)

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    The test statistic isH = 9.93 which has a 2-distribution with 2degrees of

    freedom.

    Step 4: Obtain thep-value.

    The test statistic has an associatedp-value = 0.007which is given in the rowtitled Asymp. Sig.

    Step 5: If P < , reject H0; otherwise, do not reject H0.Thep-value is less than the specified significance level of 0.05; therefore, we

    reject the null hypothesis.

    Step 6:Interpret the results of the hypothesis test.

    At the 5% significance level, the data provide sufficient evidence to conclude

    that at least one of the means is not equal to the others.

    16.6 Problems

    Problem 16.8 For theF-curve with df =(12, 5), find

    a. F0.05

    b. F0.01

    c. F0.025

    Problem 16.10 For theF-curve with df =(6, 10), find

    a. F0.05

    b. F0.01c. F0.025

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    13/14

    Section 16.6 16-13

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    Problem 16.48 Movie fans use the annualLeonard Maltin Movie Guidefor facts, cast

    members, and reviews of over 21,000 films. The movies are rated form 4stars (4*), indicating a very good movie to 1 star (1*) which Leonard Maltin

    refers to as a BOMB. Table 16 3 gives the running times, in minutes, of a

    random sample of films listed in one years guide. At the 1% significance

    level, do the data provide sufficient evidence to conclude that a differenceexists in mean running times among the four rating groups?

    Table 16 3

    Running Timesin minutes

    1* or 1.5* 2* or 2.5* 3* or 3.5* 4*

    75 97 101 10195 70 89 135

    84 105 97 93

    86 119 103 117

    58 87 86 12685 95 100 119

    Problem 16.49 Copepods are tiny crustaceans that are an essential link in the estuarine foodweb. Marine scientists G. Weiss, G. McManus, and H. Harvey at the

    Chesapeake Biological Laboratory in Maryland designed an experiment to

    determine whether dietary lipid (fat) content is important in the populationgrowth of a Chesapeake Bay copepod. Their findings were published as the

    paper Development and Lipid Composition of the Harpacticoid Copepod

    Nitocra Spinipes Reared on Different Diets (Marine Ecology Progress

    Series, vol. 132, pp. 57-61). Independent random samples of copepods wereplaced in containers containing lipid-rich diatoms, bacteria, or leafy

    macroalgae. There were 12containers total, four replicates per diet. Five

    gravid (egg-bearing) females were placed in each container. Table 16 4shows the number of copepods in each container after 14days.

    Table 16 4

    Number oCopepods

    Diatoms Bacteria Macroalgae

    426 303 277

    467 301 324

    438 293 302497 328 272

    a. Obtain the one-way ANOVA table for the data.

    b. Verify the one-way ANOVA identity.c. At the 5% significance level, do the data provide sufficient evidence to

    conclude that a difference exists in the mean number of copepods among

    the three different diets?

  • 8/10/2019 16 Analysis of Variance (ANOVA).pdf

    14/14

    16-14 Analysis of Variance (ANOVA)

    Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

    Problem 16.95 Refer to Problem 16.49. Apply the Tukey multiple comparison method to the

    data in Table 16 3. Use a family confidence level of 95%

    Problem 16.129 Indications are that Americans have become more aware of the dangers of

    excessive fat intake in their diets, although some reversal of this awareness

    appears to have developed in recent years. The U.S. Department ofAgriculture publishes data on annual consumption of selected beverages in

    Food Consumption, Prices, and Expenditures. Independent random samples

    of lowfat-milk consumptions, measured in gallons, for 1980, 1995, and 2005are given in Table 16 5.

    Table 16 5

    Lowfat milkconsumptions,in gallons, fo

    1980, 1995,and 2005

    1980 1995 2005

    11.1 15.5 11.2

    10.7 16.0 12.7

    8.6 16.1 17.4

    9.4 14.7 17.1

    9.2 11.5 13.415.1 17.1 11.4

    11.6 16.2 13.98.3 14.6

    15.2

    At the 1% level of significance, do the data provide sufficient evidence to

    conclude that there is a difference in mean (per capita) consumption of lowfat

    milk for the years 1980, 1995, and 2005? Use the Kruskal-Wallis Test.