topic 10 discrete (categorical) data analysis. discrete random variables recall that discrete random...

52
TOPIC 10 Discrete (Categorical) Data Analysis

Upload: bertram-stanley

Post on 18-Dec-2015

257 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

TOPIC 10TOPIC 10

Discrete (Categorical) Data Analysis

Discrete (Categorical) Data Analysis

Page 2: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Discrete Random VariablesDiscrete Random Variables

Recall that discrete random variables may take only discrete values.

For example,• Number of errors in a software product:

0, 1, 2, 3, 4, …• Categories of a product’s quality level”

High, medium, or low• Characteristics of a machine breakdown:

Mechanical failure, electrical failure, or operator misuse.

Page 3: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Sample ProportionsSample Proportions

Recall that the success probability p can be estimated by the sample proportion

n

xp ˆ

For large enough values of n the sample proportion can be taken to have approximately the normal distribution

n

pppNp

1,~ˆ

This expression may be written in terms of a standard normal distribution as

1,0~

1

ˆˆ

ˆ

N

npp

ppppZ

p

= Standard Errorp̂

Page 4: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Confidence Interval Estimation for pConfidence Interval Estimation for p

Assumptions:

15ˆ1

15ˆ

pn

pn

n

ppZpp

n

ppZp

ˆ1ˆˆ

ˆ1ˆˆ

22

Since the probability of p is unknown then we replace p with its estimated p̂

Page 5: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

You’re a production manager for a newspaper. You want to find the % defective. Of 200 newspapers, 35 had defects. What is the 90% confidence interval estimate of the population proportion defective?

ExampleExample

Page 6: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Example SolutionExample Solution

2192.01308.0200

825.0175.0645.1175.0

200

825.0175.0645.1175.0

ˆ1ˆˆ

ˆ1ˆˆ

22

p

p

n

ppZpp

n

ppZp

Page 7: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

SE = Sampling Error

• If no estimate of p is available, use p = 1 – p = 0.5

Sample Size for Estimating pSample Size for Estimating p

I don’t want to sample too much or too little!

2

2

2

2

1

11

ˆ

SE

ppZn

npp

SE

npp

ppZ

Page 8: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

What sample size is needed to estimate p with 90% confidence and a width L of .03?

ExampleExample

015.02

03.0

2

LSE

300769.3006015.0

5.05.0645.11

2

2

2

2

2

SE

ppZn

Page 9: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

ExercisesExercises

• Suppose that the auditing procedures require you to have 95% confidence in estimating the population proportion of sales invoices with errors within ± 0.07. The results from the past months indicate that the largest proportion has been no more than 0.15. Find the sample size needed to satisfy the requirements of the company.

Exercise:• In an election poll a random sample of 500 people

showed that 42 preferred voting for a particular candidate. Set up a 90% confidence interval estimate for the population proportion, p of the particular candidate.

Page 10: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Z Test of Hypothesis for the ProportionZ Test of Hypothesis for the Proportion

• One sample Z test for the proportion

npp

ppZ

1

ˆ

n

Xp̂ Number of items having the characteristic of interest

Sample size

where

p̂ Sample proportion of successes

p Hypothesized proportion of successes in the population

Page 11: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

You’re an accounting manager. A year-end audit showed 4% of transactions had errors. You implement new procedures. A random sample of 500 transactions had 25 errors. Has the proportion of incorrect transactions changed at the .05 level of significance?

ExampleExample

Page 12: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

• H0:

• Ha: • = , /2 = 0.025• n = • Critical Value(s):

Test Statistic:

Decision:

Conclusion:

p = .04

p .04

.05

500

Z0 1.96-1.96

.025

Reject H0

Reject H0

.025

Do not reject H0 at = .05

There is evidence proportion is 4%

Example SolutionExample Solution

14.1

50096.04.0

04.050025

1

npp

ppZ

Page 13: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

ExerciseExercise

• A fast-food chain has developed a new process to ensure that orders at the drive-through are filled correctly. The previous process filled orders correctly 85% of the time. Based on a sample of 100 orders using the new process, 94 were filled correctly. At a 0.01 level of significance, can you conclude that the new process has increased the proportion of orders filled correctly?

Page 14: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Assumptions:• Independent, random samples• Normal approximation can be used if

Large-Sample Inference about p1 – p2

Large-Sample Inference about p1 – p2

15ˆ1,15ˆ,15ˆ1,15ˆ 22221111 pnpnpnpn

Page 15: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

• (1 – α)100% Confidence Interval for ( p1 – p2)

2

22

1

11

22121

2

22

1

11

221

ˆˆˆˆˆˆ

ˆˆˆˆˆˆ

n

qp

n

qpZpppp

n

qp

n

qpZpp

• where

22

11

ˆ1ˆ

ˆ1ˆ

pq

pq

Large-Sample Inference about p1 – p2

Large-Sample Inference about p1 – p2

orn

qp

n

qpZpppp ,

ˆˆˆˆˆˆ

2

22

1

11

22121

Page 16: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

As personnel director, you want to test the perception of fairness of two methods of performance evaluation. 63 of 78 employees rated Method 1 as fair. 49 of 82 rated Method 2 as fair. Find a 99% confidence interval for the difference in perceptions.

ExampleExample

Page 17: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Example SolutionExample Solution

402.0598.01ˆ1ˆ,598.082

49ˆ

192.0808.01ˆ1ˆ,808.078

63ˆ

222

111

pqp

pqp

391.029.082

402.0598.0

78

192.0808.058.2598.0808.0

ˆˆˆˆˆˆ

21

2

22

1

11

221

pp

n

qp

n

qpZpp

Page 18: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Hypothesis Testing for Two Proportions

Ha

HypothesisNo Difference

Any DifferencePop 1 • ³

Pop 2

Pop 1 < Pop 2

Pop 1 • £ Pop 2

Pop 1 > Pop 2

H0

Z – Test Statistic:

The rejection region follows the way similar to that in the one sample tests

Hypothesized difference

1 2 0p p

1 2 0p p 1 2 0p p

1 2 0p p 1 2 0p p

1 2 0p p

Large-Sample Inference about p1 – p2

Large-Sample Inference about p1 – p2

21

2121

11

ˆˆ

nnpq

ppppZ

pqnn

XXp

1,21

21where

Page 19: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

As personnel director, you want to test the perception of fairness of two methods of performance evaluation. 63 of 78 employees rated Method 1 as fair. 49 of 82 rated Method 2 as fair. At the .01 level of significance, is there a difference in perceptions?

ExampleExample

Page 20: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

1 21 2

1 2

1 2

1 2

63 49ˆ ˆ.808 .598

78 82

63 49ˆ .70

78 82

X Xp p

n n

X Xp

n n

1 2 1 2

1 2

ˆ ˆ .808 .598 0

1 11 1 .70 1 .70ˆ ˆ178 82

2.90

p p p pZ

p pn n

Example SolutionExample Solution

Page 21: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Test Statistic:

Decision:

Conclusion:

Reject H0 at = .01

There is evidence of a difference in proportions

• H0:

• Ha: • = • n1 = n2 = • Critical Value(s):

p1 - p2 = 0

p1 - p2 0

.01

78 82

z0 2.58-2.58

Reject H0 Reject H0

.005 .005

Z = +2.90

5820050 .Z .

Example SolutionExample Solution

Page 22: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Chi-Square Tests for k ProportionsChi-Square Tests for k Proportions

• This topic extends hypothesis testing to analyze differences between population proportions based on two or more samples.

• Qualitative data that fall in more than two categories often result from a multinomial experiment.

• Some of the characteristics of the multinomial experiment are

The probabilities of the k outcomes, denoted p1, p2, … , pk, remain the same from trial to trial, where p1 + p2 + … + pk = 1

The trials are independent

• Recall, binomial experiment is a multinomial experiment with k = 2

Page 23: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Chi-Square (2) TestsChi-Square (2) Tests

Draw Sample

Populations

p1 = p2 = p3 = p4 = ….. pk

Evidence to accept/reject our

claim

Observed and expected frequencies

x , e

2 Test for equality of proportions

Page 24: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Road MapRoad Map

Decision Making

One/Two Samples Analysis of Variance

One-Way Table

χ2 Tests

Two-Way Table

Page 25: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Multinomial ExperimentMultinomial Experiment

• n identical and independent trials

• k outcomes to each trial

• Constant outcome probability, pk

• Random variable is count, nk

• Example: ask 100 people (n) which of 3 candidates (k) they will vote for

• Uses one-way contingency table: Shows number of observations in k independent groups (outcomes or variable levels)

Page 26: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

One Way Contingency TableOne Way Contingency Table

Outcomes (k = 3)

Number of responses

Candidate

Tom Bill Mary Total

35 20 45 100

Page 27: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

2 Test Basic Idea2 Test Basic Idea

Assumptions:

1. A multinomial experiment has been conducted

2. The sample size n is large: ei is greater than or equal to 5 for every cell ( i = 1, 2, 3, …, k)

1. Compares observed frequency (xi) to expected frequency [ei] assuming null hypothesis is true

2. Closer observed frequency is to expected frequency, the more likely the H0 is true

• Measured by squared difference relative to expected frequency

— Reject large values

Page 28: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

2. Test Statistic Observed frequency

Expected frequency:ei = npi,0

3. Degrees of Freedom: k – 1 Number of outcomes

Hypothesized probability

1. Hypotheses• H0: p1 = p1,0, p2 = p2,0, ..., pk = pk,0

• Ha: At least one pi is different from above

2 Test for k Proportions2 Test for k Proportions

k

i i

ii

e

ex

1

22

Page 29: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

What is the critical 2 value if k = 3, and =.05?

c20

Upper Tail Areadf .995 … .95 … .051 ... … 0.004 … 3.8412 0.010 … 0.103 … 5.991

2 Table (Portion)

If xi = ei, 2 = 0.

Do not reject H0

df = k - 1 = 2

5.991

Reject H0

= .05

Finding Critical Value ExampleFinding Critical Value Example

Page 30: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

As personnel director, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair, 45 rated Method 2 as fair, 72 rated Method 3 as fair. At the .05 level of significance, is there a difference in perceptions?

2 Test for k Proportions Example2 Test for k Proportions Example

Page 31: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

2 Test for k Proportions Solution2 Test for k Proportions Solution

x1 = 63 x2 = 45 x3 = 72

603

180321 eee

3.6

60

6072

60

6045

60

6063 222

1

22

k

i i

ii

e

ex

Page 32: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Test Statistic:

Decision:

Conclusion:

2 = 6.3

Reject H0 at = .05

There is evidence of a difference in proportions

• H0:

• Ha:• =• n1 = n2 = n3 =• Critical Value(s):

c20

Reject H0

p1 = p2 = p3 = 1/3

At least 1 is different.05

63 45 72

5.991

= .05

2 Test for k Proportions Solution2 Test for k Proportions Solution

Page 33: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Road MapRoad Map

Decision Making

One/Two Samples Analysis of Variance

Two-Way

Table

χ2 Tests

One-Way Table

Test of Independenc

e

Page 34: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

• Shows if a relationship exists between two qualitative (categorical) variables

One sample is drawn Does not show causality

• Uses two-way contingency table

2 Test of Independence2 Test of Independence

Assumptions:

1. Multinomial experiment has been conducted

2. The sample size, n, is large: eij is greater than or equal to 5 for every cell

Page 35: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Shows number of observations from 1 sample jointly in 2 qualitative variables

Levels of variable 2

Levels of variable 1

Two-Way Contingency TableTwo-Way Contingency Table

House Location House Style Urban Rural Total

Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

Page 36: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

1. Hypotheses• H0: Variables are independent

• Ha: Variables are related (dependent)

3. Degrees of Freedom: (r – 1)(c – 1)

Rows Columns

2. Test Statistic Observed frequency

Expected frequency

cells all

2

2

ij

ijij

e

ex

2 Test of Independence2 Test of Independence

Page 37: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

1. Statistical independence means joint probability equals product of marginal probabilities

2. Compute marginal probabilities and multiply for joint probability

3. Expected frequency is sample size times joint probability

2 Test of Independence Expected Frequencies2 Test of Independence Expected Frequencies

Page 38: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

78 160

Marginal probability =

112 160

Marginal probability = Joint probability = 112

16078 160

Expected freq. = 160× 112 160

78 160

= 54.6

Location Urban Rural

House Style Obs. Obs. Total

Split–Level 63 49 112

Ranch 15 33 48

Total 78 82 160

Expected Frequency ExampleExpected Frequency Example

Ri

Cj

Page 39: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

House LocationUrban Rural

House Style Obs. Exp. Obs. Exp. Total

Split Level 63

112×78 160

54.6 49

112×82 160

57.4 112

Ranch 15

48×78 160

23.4 33

48×82 160

24.6 48

Total 78 78 82 82 160•

= =

= =

Expected Frequency CalculationExpected Frequency Calculation

n

cre jiij

ri: Total frequency in row i-th

cj: Total frequency in column j-th

Page 40: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

As a realtor you want to determine if house style and house location are related. At the .05 level of significance, is there evidence of a relationship?

ExampleExample

House Location House Style Urban Rural Total

Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

Page 41: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

House Location Urban Rural

House Style Obs. Exp. Obs. Exp. Total

Split-Level 63 54.6 49 57.4 112

Ranch 15 23.4 33 24.6 48

Total 78 78 82 82 160

eij 5 in all cells112×82

160

48×78 160

48×82 160

112×78 160

Example SolutionExample Solution

= =

= =

Page 42: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Example SolutionExample Solution

Test Statistic:

41.8

6.24

6.2433

4.57

4.5749

6.54

6.5463 222

2

2

cellsall ij

ijij

e

ex

Page 43: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Test Statistic:

Decision:

Conclusion:

2 = 8.41

Reject H0 at = .05

There is evidence of a relationship

• H0:

• Ha:• =• df = • Critical Value(s):

c20

Reject H0

No Relationship

Relationship.05

(2 – 1) (2 – 1) = 1

3.841

= .05

Example SolutionExample Solution

Page 44: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level of significance, is there evidence of a relationship?

Diet Pepsi

Diet Coke No Yes Total

No 84 32 116Yes 48 122 170Total 132 154 286

Exercise 1Exercise 1

Page 45: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Diet Pepsi No Yes

Diet Coke Obs. Exp. Obs. Exp. Total

No 84 53.5 32 62.5 116

Yes 48 78.5 122 91.5 170

Total 132 132 154 154 286

eij 5 in all cells

170×132 286

170×154 286

116×132 286

154×116 286

Exercise 1 SolutionExercise 1 Solution

= =

= =

Page 46: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Exercise 1 SolutionExercise 1 Solution

Test Statistic:

29.54

5.91

5.91122

5.62

5.6232

5.53

5.5384 222

2

2

cellsall ij

ijij

e

ex

Page 47: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Test Statistic:

Decision:

Conclusion:

2 = 54.29

Reject H0 at = .05

There is evidence of a relationship

• H0:

• Ha:• =• df = • Critical Value(s):

c20

Reject H0

No Relationship

Relationship.05

(2 – 1) (2 – 1) = 1

3.841

= .05

Exercise 1 SolutionExercise 1 Solution

Page 48: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

There is a statistically significant relationship between purchasing Diet Coke and Diet Pepsi. So what do you think the relationship is? Aren’t they competitors?

Diet Pepsi

Diet Coke No Yes Total

No 84 32 116Yes 48 122 170Total 132 154 286

Exercise 2Exercise 2

Page 49: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Low Income

High IncomeDiet Pepsi

Diet Coke No Yes TotalNo 4 30 34Yes 40 2 42

Total 44 32 76•

Diet PepsiDiet Coke No Yes TotalNo 80 2 82Yes 8 120 128

Total 88 122 210•

You Re-Analyze the DataYou Re-Analyze the Data

Page 50: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Apparent relation

Underlying causal relation

Control or intervening variable (true cause)

Diet Coke

Diet Pepsi

True Relationships*True Relationships*

Page 51: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,

Numbers don’t think - People do!

Moral of the Story*Moral of the Story*

Page 52: TOPIC 10 Discrete (Categorical) Data Analysis. Discrete Random Variables Recall that discrete random variables may take only discrete values. For example,