part 14: statistical tests – part 2 14-1/25 statistics and data analysis professor william greene...

25
Part 14: Statistical Tests – Part 2 4-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Upload: marquez-priddle

Post on 02-Apr-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-1/25

Statistics and Data Analysis

Professor William Greene

Stern School of Business

IOMS Department

Department of Economics

Page 2: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-2/25

Statistics and Data Analysis

Part 14 – Statistical Tests: 2

Page 3: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-3/25

Statistical Testing Applications

Methodology Analyzing Means Analyzing Proportions

Page 4: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-4/25

Classical Testing Methodology

Formulate the hypothesis. Determine the appropriate test Decide upon the α level. (How confident do we

want to be in the results?) The worldwide standard is 0.05.

Formulate the decision rule (reject vs. not reject) – define the rejection region

Obtain the data Apply the test and make the decision.

Page 5: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-5/25

Comparing Two Populations

These are data on the number of calls cleared by the operators at two call centers on the same day. Call center 1 employs a different set of procedures for directing calls to operators than call center 2.

Do the data suggest that the populations are different?

Call Center 1 (28 observations)797 794 817 813 817 793 762 719 804 811 747 804 790 796 807 801 805 811 835 787 800 771 794 805 797 724 820 701

Call Center 2 (32 observations)817 801 798 797 788 802 821 779 803 807 789 799 794 792 826 808 808 844 790 814 784 839 805 817 804 807 800 785 796 789 842 829

Page 6: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-6/25

Application 1: Equal Means

Application: Mean calls cleared at the two call centers are the same

H0: μ1 = μ2

H1: μ1 ≠ μ2

Rejection region: Sample means from centers 1 and 2 are very different.

Complication: What to use for the variance(s) for the difference?

Page 7: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-7/25

Standard Approach

H0: μ1 = μ2

H1: μ1 ≠ μ2

Equivalent: H0: μ1 – μ2 = 0 Test is based on the two means:

Reject the null hypothesis if is very different from zero (in either direction.

Rejection region is large positive or negative values of

1 2x - x

1 2x - x

1 2x - x

Page 8: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-8/25

Rejection Region for Two Means

1 2Reject H if |x - x | > t s

where t is the t value (normal). Use 1.96 as

usual for 5% significance. "s" is the standard

error of the difference in the means. What to use?

Two issues:

Equal variances in the two populations?

Both sample sizes large enough to use CLT?

Page 9: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-9/25

Easiest Approach: Large Samples

Assume relatively large samples, so we can use the central limit theorem.

It won’t make much difference whether the variances are assumed (actually are) the same or not.

Page 10: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-10/25

Variance Estimator

2 21 2

1 2

s sIn all cases, you can use s* =

N N

Use 1.96 for the the critical t value because we

are using the central limit theorem to allow us

to use the normal distribution.

Page 11: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-11/25

Test of Means H0: μCall Center 1 – μCall Center 2 = 0

H1: μCall Center 1 – μCall Center 2 ≠ 0

Use α = 0.05

Rejection region:

1 2 1 2

2 21 1 2 2

x x - 0 x x = > 1.96

s *(s / N ) (s / N )

Page 12: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-12/25

Basic ComparisonsData

Center2Center1

860

840

820

800

780

760

740

720

700

Boxplot of Center1, Center2

Descriptive Statistics: Center1, Center2 Variable N Mean SE Mean StDev Min. Med. Max.Center1 28 790.07 6.05 32.00 701.00 798.50 835.00Center2 32 805.44 2.98 16.87 779.00 802.50 844.00

Means look different

Standard deviations (variances) look quite different.

Page 13: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-13/25

Test for the Difference

1 2

2 2 2 21 2

1 2

x x 0 790.07 - 805.44z = =

s s 32.00 16.8728 32N N

-15.37 =

45.465-15.37

= 6.742

= -2.279.

This is larger (in absolute value) than 1.96, so we reject the

null hypothesis that the means are equal. It appears that

the means of the numbers of calls cleared at the two centers

are different.

Stat Basic Statistics 2 sample t (do not check equal variances box)This can also be done by providing just the sample sizes, means and standard deviations.

Page 14: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-14/25

Application: Paired Samples Example: Do-overs on SAT tests

Hypothesis: Scores on the second test are no better than scores on the first.

(Hmmm… one sided test…) Hypothesis: Scores on the second test are the same

as on the first. Rejection region: Mean of a sample of second scores

is very different from the mean of a sample of first scores.

Subsidiary question: Is the observed difference (to the extent there is one) explained by the test prep courses? How would we test this?

Interesting question: Suppose the samples were not paired – just two samples.

Page 15: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-15/25

Paired Samples

No new theory is needed Compute differences for each observation Treat the differences as a single sample

from a population with a hypothesized mean of zero.

Page 16: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-16/25

Testing Application 2: Proportion

Investigate: Proportion = a value Quality control: The rate of defectives

produced by a machine has changed. H0: θ = θ 0

(θ 0 = the value we thought it was)H1: θ ≠ θ 0

Rejection region: A sample of rates produces a proportion that is far from θ0

Page 17: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-17/25

Procedure for Testing a Proportion

Use the central limit theorem: The sample proportion, p, is a sample mean.

Treat this as normally distributed. The sample variance is p(1-p). The estimator of the variance of the mean is

p(1-p)/N.

Page 18: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-18/25

Testing a Proportion

H0: θ = θ 0

H1: θ ≠ θ 0

As usual, set α = .05 Treat this as a test of a mean. Rejection region = sample

proportions that are far from θ0.

0

0 0

p - Test statistic =

(1 - )/NNote, assuming θ=θ0 implies we are assuming that the variance is θ0(1- θ0)

Page 19: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-19/25

Default Rate

Investigation: Of the 13,444 card applications, 10,499 were accepted.

The default rate for those 10,499 was 996/10,499 = 0.09487.

I am fairly sure that this number is higher than was really appropriate for cardholders at this time. I think the right number is closer to 6%.

Do the data support my hypothesis?

Page 20: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-20/25

Testing the Default Rate

p = 0.09487 θ0 = 0.06 As usual, use 5%.

0.09487 0.06

15.045.0.06(1 0.06) / 10,499

This is much larger than the critical value of 1.96,

so my hypothesis is rejected.

Page 21: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-21/25

Application 3: Comparing Proportions

Investigate: Owners and Renters have the same credit card acceptance rate

H0: θRENTERS = θOWNERS

H1: θRENTERS ≠ θOWNERS

Rejection region: Acceptance rates for sample of the two types of applicants are very different.

Page 22: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-22/25

Comparing Proportions

0 OWNERS RENTERS

0 OWNERS RENTERS

O R

O O R R

O R

H : - = 0

H : - 0

Use α = 0.05 as usual.

(p - p ) - 0Base the test on t =

p (1-p ) p (1-p )+

N N

If t is greater than the critical value, reject the null

hypothesis. We are using the CLT throughout, so

use the normal distribution; z = 1.96

Note, here we are not assuming a specific θO or θR so we use the sample variance.

Page 23: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-23/25

Some Evidence

= Homeowners

Page 24: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-24/25

Analysis

O R

5030 5469p = = 0.8206, p = = 0.7477

5030+1100 5469+18450.8206 -0.7477

z = 0.8206(0.1794) 0.7477(.2523)

+6130 7314

0.0729 =

0.007082 = 10.294

This is larger than the critical value of 1.96, so the hypothesis

that the proportions are equal is rejected.

Page 25: Part 14: Statistical Tests – Part 2 14-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of

Part 14: Statistical Tests – Part 214-25/25

Followup Analysis of Default

DEFAULTOWNRENT 0 1 All 0 4854 615 5469 46.23 5.86 52.09

1 4649 381 5030 44.28 3.63 47.91

All 9503 996 10499 90.51 9.49 100.00

Are the default rates the same for owners and renters? The data for the 10,499 applicants who were accepted are in the table above. Test the hypothesis that the two default rates are the same.