kxu stat-anderson-ch10-student 2

1

Chapter 10 Comparisons Involving Means

11 = = 22 ? ?

ANOVAANOVA

Estimation of the Difference between the Means of Two Populations: Independent Samples

Hypothesis Tests about the Difference between the Means of Two Populations: Independent Samples

Inferences about the Difference between the Means of Two Populations: Matched Samples

Introduction to Analysis of Variance (ANOVA)ANOVA: Testing for the Equality of k Population Means

2

Estimation of the Difference Between the Means of Two Populations: Independent

Samples

Point Estimator of the Difference between the Means of Two PopulationsSampling DistributionInterval Estimate of Large-Sample Case

Interval Estimate of Small-Sample Case

x x1 2x x1 2

3

Point Estimator of the Difference Betweenthe Means of Two Populations

Let 1 equal the mean of population 1 and 2 equal the mean of population 2.The difference between the two population means is 1 - 2.

To estimate 1 - 2, we will select a simple random sample of size n1 from population 1 and a simple random sample of size n2 from population 2.Let equal the mean of sample 1 and equal the mean of sample 2.The point estimator of the difference between the means of the populations 1 and 2 is .

x x1 2x x1 2

x1x1 x2x2

4

E x x( )1 2 1 2 E x x( )1 2 1 2

n Properties of the Sampling Distribution of Properties of the Sampling Distribution of

• Expected ValueExpected Value

Sampling Distribution of Sampling Distribution of x x1 2x x1 2

x x1 2x x1 2

5

Properties of the Sampling Distribution of – Standard Deviation

where: 1 = standard deviation of population 1

2 = standard deviation of population 2

n1 = sample size from population 1

n2 = sample size from population 2

Sampling Distribution of x x1 2x x1 2

x x1 2x x1 2

x x n n1 2

12

1

22

2

x x n n1 2

12

1

22

2

6

Interval Estimate with 1 and 2 Known

where:1 - is the confidence coefficient

(level).

Interval Estimate of 1 - 2:Large-Sample Case (n1 > 30 and n2 > 30)

x x z x x1 2 2 1 2 /x x z x x1 2 2 1 2 /

7

n Interval Estimate with Interval Estimate with 11 and and 22 Unknown Unknown

where:where:

Interval Estimate of 1 - 2:Large-Sample Case (n1 > 30 and n2 > 30)

x x z sx x1 2 2 1 2 /x x z sx x1 2 2 1 2 /

ssn

snx x1 2

12

1

22

2 s

sn

snx x1 2

12

1

22

2

8

Example: Par, Inc.

Interval Estimate of 1 - 2: Large-Sample Case

Par, Inc. is a manufacturer of golf equipment and has developed a new golf ball that has been designed to provide “extra distance.” In a test of driving distance using a mechanical driving device, a sample of Par golf balls was compared with a sample of golf balls made by Rap, Ltd., a competitor.

The sample statistics appear on the next slide.

9

Example: Par, Inc.

Interval Estimate of 1 - 2: Large-Sample Case

– Sample Statistics

Sample #1 Sample #2 Par, Inc. Rap, Ltd.

Sample Size n1 = 120 balls n2 = 80 ballsMean = 235 yards = 218 yardsStandard Dev. s1 = ___yards s2 =____ yards

x1x1 2x2x

10

Point Estimate of the Difference Between Two Population Means

1 = mean distance for the population of

Par, Inc. golf balls2 = mean distance for the population of

Rap, Ltd. golf balls

Point estimate of 1 - 2 = = 235 - 218 = 17 yards.

x x1 2x x1 2

Example: Par, Inc.

11

Point Estimator of the Difference Betweenthe Means of Two Populations

Population 1Population 1Par, Inc. Golf BallsPar, Inc. Golf Balls

11 = mean driving = mean driving distance of Pardistance of Par

golf ballsgolf balls

Population 1Population 1Par, Inc. Golf BallsPar, Inc. Golf Balls

11 = mean driving = mean driving distance of Pardistance of Par


Population 2Population 2Rap, Ltd. Golf BallsRap, Ltd. Golf Balls

22 = mean driving = mean driving distance of Rapdistance of Rap


Population 2Population 2Rap, Ltd. Golf BallsRap, Ltd. Golf Balls

22 = mean driving = mean driving distance of Rapdistance of Rap


11 – – 22 = difference between= difference between the mean distancesthe mean distances

Simple random sampleSimple random sample of of nn11 Par golf balls Par golf balls

xx11 = sample mean distance = sample mean distancefor sample of Par golf ballfor sample of Par golf ball

Simple random sampleSimple random sample of of nn11 Par golf balls Par golf balls

xx11 = sample mean distance = sample mean distancefor sample of Par golf ballfor sample of Par golf ball

Simple random sampleSimple random sample of of nn22 Rap golf balls Rap golf balls

xx22 = sample mean distance = sample mean distancefor sample of Rap golf ballfor sample of Rap golf ball

Simple random sampleSimple random sample of of nn22 Rap golf balls Rap golf balls

xx22 = sample mean distance = sample mean distancefor sample of Rap golf ballfor sample of Rap golf ball

xx11 - - xx22 = Point Estimate of = Point Estimate of 11 – – 22

12

95% Confidence Interval Estimate of the Difference Between Two Population Means: Large-Sample Case, 1 and 2 Unknown

Substituting the sample standard deviations for the population standard deviation:

= ___________ or 11.86 yards to 22.14 yards.We are 95% confident that the difference between the mean driving distances of Par, Inc. balls and Rap, Ltd. balls lies in the interval of _______________ yards.

x x zn n1 2 212

1

22

2

2 2

17 1 9615120

2080

/ .( ) ( )

x x zn n1 2 212

1

22

2

2 2

17 1 9615120

2080

/ .( ) ( )

Example: Par, Inc.

13

Interval Estimate of 1 - 2:Small-Sample Case (n1 < 30 and/or n2 <

30)Interval Estimate with 2 Known (and equal)

where:

x x z x x1 2 2 1 2 /x x z x x1 2 2 1 2 /

x x n n1 2

2

1 2

1 1 ( ) x x n n1 2

2

1 2

1 1 ( )

14

Interval Estimate with 2 Unknown (and assumed equal)

where:

and the degrees of freedom for the t-distribution is n1+n2-2.

Interval Estimate of 1 - 2:Small-Sample Case (n1 < 30 and/or n2 <

30)

x x t sx x1 2 2 1 2 /x x t sx x1 2 2 1 2 /

sn s n s

n n2 1 1

22 2

2

1 2

1 12

( ) ( )s

n s n sn n

2 1 12

2 22

1 2

1 12

( ) ( )s s

n nx x1 2

2

1 2

1 1 ( )s s

n nx x1 2

2

1 2

1 1 ( )

15

Example: Specific Motors

Specific Motors of Detroit has developed a newautomobile known as the M car. 12 M cars and 8 J

cars(from Japan) were road tested to compare miles-per-gallon (mpg) performance. The sample statistics are:

Sample #1 Sample #2

M Cars J CarsSample Size n1 = 12 cars n2 = 8 cars

Mean = 29.8 mpg = 27.3 mpg

Standard Deviation s1 = ____ mpg s2 = ____ mpg

x2x2x1x1

16

Point Estimate of the Difference Between Two Population Means

1 = mean miles-per-gallon for the population of

M cars2 = mean miles-per-gallon for the population of

J cars

Point estimate of 1 - 2 = = ________ = ___ mpg.

x x1 2x x1 2


17

95% Confidence Interval Estimate of the Difference Between Two Population Means: Small-Sample CaseWe will make the following assumptions:– The miles per gallon rating must be

normally distributed for both the M car and the J car.– The variance in the miles per gallon rating

must be the same for both the M car and the J

car.


18

n 95% Confidence Interval Estimate of the Difference Between Two Population Means: Small-Sample Case Using the t distribution with n1 + n2 - 2 = ___ degreesof freedom, the appropriate t value is t.025 =

______.We will use a weighted average of the two sample

variances as the pooled estimator of 2.


19

95% Confidence Interval Estimate of the Difference Between Two Population Means: Small-Sample Case

= _____________, or .3 to 4.7 miles per gallon.We are 95% confident that the difference between themean mpg ratings of the two car types is from .3 to 4.7 mpg (with the M car having the higher mpg).

sn s n s

n n2 1 1

22 2

2

1 2

2 21 12

11 2 56 7 1 8112 8 2

5 28

( ) ( ) ( . ) ( . ).s

n s n sn n

2 1 12

2 22

1 2

2 21 12

11 2 56 7 1 8112 8 2

5 28

( ) ( ) ( . ) ( . ).

x x t sn n1 2 025

2

1 2

1 12 5 2 101 5 28

112

18

. ( ) . . . ( )x x t sn n1 2 025

2

1 2

1 12 5 2 101 5 28

112

18

. ( ) . . . ( )


20

Hypotheses

H0: 1 - 2 < 0 H0: 1 - 2 > 0 H0: 1 - 2 = 0

Ha: 1 - 2 > 0 Ha: 1 - 2 < 0 Ha: 1 - 2 0

Test Statistic Large-Sample

Small-Sample

Hypothesis Tests About the Difference between the Means of Two Populations:

Independent Samples

zx x

n n

( ) ( )1 2 1 2

12

1 22

2

zx x

n n

( ) ( )1 2 1 2

12

1 22

2

tx x

s n n

( ) ( )

( )1 2 1 2

21 21 1

t

x x

s n n

( ) ( )

( )1 2 1 2

21 21 1

21

Hypothesis Tests About the Difference between the Means of Two Populations: Large-Sample Case Par, Inc. is a manufacturer of golf equipment and has developed a new golf ball that has been designed to provide “extra distance.” In a test of driving distance using a mechanical driving device, a sample of Par golf balls was compared with a sample of golf balls made by Rap, Ltd., a competitor. The sample statistics appear on the next slide.

Example: Par, Inc.

22

Hypothesis Tests About the Difference Between the Means of Two Populations: Large-Sample Case– Sample Statistics

Sample #1 Sample #2 Par, Inc. Rap, Ltd.

Sample Size n1 = 120 balls n2 = 80 ballsMean = 235 yards = 218 yardsStandard Dev. s1 = ____ yards s2 = ____ yards

Example: Par, Inc.

x1x1 x2x2

23

Hypothesis Tests About the Difference Between the Means of Two Populations: Large-Sample Case

Can we conclude, using a .01 level of significance, that the mean driving distance of Par, Inc. golf balls is greater than the mean driving distance of Rap, Ltd. golf balls?

Example: Par, Inc.

24

n Hypothesis Tests About the Difference Between the Means of Two Populations: Large-Sample Case

1 = mean distance for the population of Par, Inc.

golf balls2 = mean distance for the population of Rap, Ltd.

golf balls

• HypothesesH0: 1 - 2 < 0

Ha: 1 - 2 > 0

Example: Par, Inc.Example: Par, Inc.

25

Hypothesis Tests About the Difference Between the Means of Two Populations: Large-Sample Case– Rejection Rule

Reject H0 if z > ________

zx x

n n

( ) ( ) ( )

( ) ( ) ..1 2 1 2

12

1

22

2

2 2

235 218 0

15120

2080

172 62

6 49

z

x x

n n

( ) ( ) ( )

( ) ( ) ..1 2 1 2

12

1

22

2

2 2

235 218 0

15120

2080

172 62

6 49

Example: Par, Inc.

26

n Hypothesis Tests About the Difference Between the Means of Two Populations: Large-Sample Case• Conclusion

Reject H0. We are at least 99% confident that the mean driving distance of Par,

Inc. golf balls is greater than the mean driving distance of Rap, Ltd. golf balls.

Example: Par, Inc.Example: Par, Inc.

27

Hypothesis Tests About the Difference Between the Means of Two Populations: Small-Sample Case

Can we conclude, using a .05 level of significance, that the miles-per-gallon (mpg) performance of M cars is greater than the miles-per-gallon performance of J cars?


28

n Hypothesis Tests About the Difference Between the Means of Two Populations: Small-Sample Case

1 = mean mpg for the population of M cars

2 = mean mpg for the population of J cars

• HypothesesH0: 1 - 2 < 0

Ha: 1 - 2 > 0

Example: Specific MotorsExample: Specific Motors

29


Hypothesis Tests About the Difference Between the Means of Two Populations: Small-Sample Case– Rejection Rule

Reject H0 if t > _______

(a = .05, d.f. = 18)

– Test Statistic

where:

tx x

s n n

( ) ( )

( )1 2 1 2

21 21 1

t

x x

s n n

( ) ( )

( )1 2 1 2

21 21 1

2 22 1 1 2 2

1 2

( 1) ( 1)

2

n s n ss

n n

2 22 1 1 2 2

1 2

( 1) ( 1)

2

n s n ss

n n

30

Inference About the Difference between the Means of Two Populations: Matched

Samples

With a matched-sample design each sampled item provides a pair of data values.The matched-sample design can be referred to as blocking.This design often leads to a smaller sampling error than the independent-sample design because variation between sampled items is eliminated as a source of sampling error.

31

Example: Express Deliveries

Inference About the Difference between the Means of Two Populations: Matched Samples

A Chicago-based firm has documents that must be quickly distributed to district offices throughout the U.S. The firm must decide between two delivery services, UPX (United Parcel Express) and INTEX (International Express), to transport its documents. In testing the delivery times of the two services, the firm sent two reports to a random sample of ten district offices with one report carried by UPX and the other report carried by INTEX.

Do the data that follow indicate a difference in mean delivery times for the two services?

32

Delivery Time (Hours)District Office UPX INTEX DifferenceSeattle 32 25 7

Los Angeles 30 24 6Boston 19 15 4Cleveland 16 15 1New York 15 13 2Houston 18 15 3Atlanta 14 15 -1St. Louis 10 8 2Milwaukee 7 9 -2Denver 16 11 5


33

Inference About the Difference between the Means of Two Populations: Matched Samples Let d = the mean of the difference values for the two delivery services for the population of district offices

– Hypotheses H0: d = 0, Ha: d


34

n Inference About the Difference between the Means of Two Populations: Matched Samples• Rejection Rule

Assuming the population of difference values is approximately normally distributed, the t distribution with n - 1 degrees of freedom applies. With = .05, t.025 = 2.262 (9 degrees of freedom).

Reject H0 if t < _________ or if t > __________

Example: Express DeliveriesExample: Express Deliveries

35

Inference About the Difference between the Means of Two Populations: Matched Samples

ddni

( ... ).

7 6 510

2 7ddni ( ... )

.7 6 5

102 7

sd dndi

( ) ..

2

176 19

2 9sd dndi

( ) ..

2

176 19

2 9

tds n

d

d

2 7 02 9 10

2 94..

.tds n

d

d

2 7 02 9 10

2 94..

.


36

n Inference About the Difference between the Means of Two Populations: Matched Samples• Conclusion

Reject H0.

There is a significant difference between the mean delivery times for the two services.

Example: Express DeliveriesExample: Express Deliveries

37

Introduction to Analysis of Variance

Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means using data obtained from observational or experimental studies.We want to use the sample results to test the following hypotheses.

H0: 1=2=3=. . . = k

Ha: Not all population means are equal

38

Introduction to Analysis of Variance

n If H0 is rejected, we cannot conclude that all population means are different.

n Rejecting H0 means that at least two population means have different values.

39

Assumptions for Analysis of Variance

For each population, the response variable is normally distributed.The variance of the response variable, denoted 2, is the same for all of the populations.The observations must be independent.

40

Analysis of Variance:Testing for the Equality of k Population

Means

Between-Treatments Estimate of Population VarianceWithin-Treatments Estimate of Population VarianceComparing the Variance Estimates: The F TestThe ANOVA Table

41

A between-treatment estimate of 2 is called the mean square treatment and is denoted MSTR.

The numerator of MSTR is called the sum of squares treatment and is denoted SSTR.The denominator of MSTR represents the degrees of freedom associated with SSTR.

Between-Treatments Estimateof Population Variance

1

)(

MSTR 1

2

k

xxnk

jjj

1

)(

MSTR 1

2

k

xxnk

jjj

42

The estimate of 2 based on the variation of the sample observations within each sample is called the mean square error and is denoted by MSE.

The numerator of MSE is called the sum of squares error and is denoted by SSE.The denominator of MSE represents the degrees of freedom associated with SSE.

Within-Samples Estimateof Population Variance

kn

sn

T

k

jjj

1

2)1(

MSEkn

sn

T

k

jjj

1

2)1(

MSE

43

Comparing the Variance Estimates: The F Test

If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR d.f. equal to k - 1 and MSE d.f. equal to nT - k.

If the means of the k populations are not equal, the value of MSTR/MSE will be inflated because MSTR overestimates 2.Hence, we will reject H0 if the resulting value of MSTR/MSE appears to be too large to have been selected at random from the appropriate F distribution.

44

Test for the Equality of k Population Means

Hypotheses

H0: 1=2=3=. . . = k

Ha: Not all population means are equal

Test StatisticF = MSTR/MSE

Rejection Rule Reject H0 if F > F

where the value of F is based on an F distribution with k - 1 numerator degrees of freedom and nT - 1 denominator degrees of freedom.

45

Sampling Distribution of MSTR/MSE

The figure below shows the rejection region associated with a level of significance equal to where F denotes the critical value.

Do Not Reject H0Do Not Reject H0 Reject H0Reject H0

MSTR/MSEMSTR/MSE

Critical ValueCritical ValueFF

46

ANOVA Table

Source of Sum of Degrees of MeanVariation Squares Freedom Squares

FTreatment SSTR k - 1 MSTR

MSTR/MSE

Error SSE nT - k MSE

Total SST nT - 1

SST divided by its degrees of freedom nT - 1 is simply the overall sample variance that would be obtained if we treated the entire nT observations as one data set.

k

j

n

iij

j

xx1 1

2 SSESSTR)(SST

k

j

n

iij

j

xx1 1

2 SSESSTR)(SST

47

Example: Reed Manufacturing

Analysis of VarianceJ. R. Reed would like to know if the mean

number of hours worked per week is the same for the department managers at her three manufacturing plants (Buffalo, Pittsburgh, and Detroit).

A simple random sample of 5 managers from each of the three plants was taken and the number of hours worked by each manager for the previous week is shown on the next slide.

48

Analysis of Variance

Plant 1 Plant 2 Plant 3 Observation Buffalo Pittsburgh

Detroit 1 48 73 51 2 54 63 63 3 57 66 61 4 54 64 54 5 62 74 56

Sample Mean 55 68 57Sample Variance ____ _____

______


49

Analysis of Variance– Hypotheses

H0: 1=2=3

Ha: Not all the means are equal

where: 1 = mean number of hours worked per

week by the managers at Plant 1 2 = mean number of hours worked per week by the managers at Plant 2

3 = mean number of hours worked per week by the managers at Plant 3


50

Analysis of Variance– Mean Square Treatment

Since the sample sizes are all equal x = (55 + 68 + 57)/3 = ____

SSTR = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = ____

MSTR = 490/(3 - 1) = 245– Mean Square Error

SSE = 4(26.0) + 4(26.5) + 4(24.5) = _____

MSE = 308/(15 - 3) = 25.667

==


51

Analysis of Variance– F - Test

If H0 is true, the ratio MSTR/MSE should be near 1 since both MSTR and MSE are estimating 2. If Ha is true, the ratio should be significantly larger than 1 since MSTR tends to overestimate 2.


52

n Analysis of Variance

• Rejection Rule Assuming = .05, F.05 = 3.89 (2 d.f. numerator, 12 d.f. denominator). Reject H0 if F > _______

• Test Statistic F = MSTR/MSE = 245/25.667 =

_______

Example: Reed ManufacturingExample: Reed Manufacturing

53

Analysis of Variance– ANOVA Table

Source of Sum of Degrees of Mean Variation Squares Freedom Square F

Treatments 490 2 245 9.55 Error 308 12 25.667

Total 798 14


54

n Analysis of Variance

• Conclusion F = 9.55 > F.05 = _____, so we reject H0. The mean number of hours worked per week by department managers is not the same at each plant.

Example: Reed ManufacturingExample: Reed Manufacturing

55

End of Chapter 10

kxu stat-anderson-ch10-student 2

Business

mean of sample

sample of golf balls

sample of par golf balls

sample mean distance

mean of population

population of par

largesample case par

population of rap