summary table of influence procedures for a single sample (i)

Horng-Chyi HorngHorng-Chyi Horng Statistics IIStatistics II 11

Summary Table of Influence Procedures Summary Table of Influence Procedures for a Single Sample (I)for a Single Sample (I) &4-8 (&8-6)


Summary Table of Influence Procedures Summary Table of Influence Procedures for a Single Sample (II)for a Single Sample (II)


Testing for Goodness of FitTesting for Goodness of Fit

In general, we do not know the underlying distribution of In general, we do not know the underlying distribution of the population, and we wish to test the hypothesis that a the population, and we wish to test the hypothesis that a particular distribution will be satisfactory as a population particular distribution will be satisfactory as a population model.model.

Probability PlottingProbability Plotting can only be used for examining can only be used for examining whether a population is normal distributed.whether a population is normal distributed.

Histogram Plotting and others can only be used to guess Histogram Plotting and others can only be used to guess the possible underlying distribution type. the possible underlying distribution type.

&4-9 (&8-7)


Goodness-of-Fit Test (I)Goodness-of-Fit Test (I)

A random sample of size n from a population whose probaA random sample of size n from a population whose probability distribution is unknown. bility distribution is unknown.

These n observations are arranged in a frequency histograThese n observations are arranged in a frequency histogram, having k bins or class intervals.m, having k bins or class intervals.

Let OLet Oii be the observed frequency in the ith class interval, a be the observed frequency in the ith class interval, a

nd End Eii be the expected frequency in the ith class interval fro be the expected frequency in the ith class interval fro

m the hypothesized probability distribution, the test statistim the hypothesized probability distribution, the test statistics is cs is


Goodness-of-Fit Test (II)Goodness-of-Fit Test (II)

If the population follows the hypothesized distribution, XIf the population follows the hypothesized distribution, X0022

has approximately a chi-square distribution with k-p-1 d.f., has approximately a chi-square distribution with k-p-1 d.f., where p represents the number of parameters of the where p represents the number of parameters of the hypothesized distribution estimated by sample statistics.hypothesized distribution estimated by sample statistics.

That is,That is,

Reject the hypothesis if Reject the hypothesis if

21

1

220 ~

pk

k

i i

ii

E

EO

21,

20 pk


Goodness-of-Fit Test (III)Goodness-of-Fit Test (III)

Class intervals are not required to be equal width.Class intervals are not required to be equal width.

The minimum value of expected frequency can not be to The minimum value of expected frequency can not be to small. 3, 4, and 5 are ideal minimum values.small. 3, 4, and 5 are ideal minimum values.

When the minimum value of expected frequency is too When the minimum value of expected frequency is too small, we can combine this class interval with its small, we can combine this class interval with its neighborhood class intervals. In this case, k would be neighborhood class intervals. In this case, k would be reduced by one.reduced by one.


Example 8-18Example 8-18 The number of defects in printed circuit boards is The number of defects in printed circuit boards is

hypothesized to follow a Poisson distribution. A random sample of size 60 hypothesized to follow a Poisson distribution. A random sample of size 60 printed boards has been collected, and the number of defects observed as the table printed boards has been collected, and the number of defects observed as the table below:below:

The only parameter in Poisson distribution is The only parameter in Poisson distribution is , can be estimated by the , can be estimated by the sample mean = {0(32) + 1(15) + 2(19) + 3(4)}/60 = 0.75. Therefore, the sample mean = {0(32) + 1(15) + 2(19) + 3(4)}/60 = 0.75. Therefore, the expected frequency is:expected frequency is:

32.2860472.0

472.0!0

)75.0()0(

1

075.0

1

E

eXPp


Example 8-18 (Cont.)Example 8-18 (Cont.)

Since the expected frequency in the last cell is less than 3, we combine the last Since the expected frequency in the last cell is less than 3, we combine the last two cells:two cells:


Example 8-18 (Cont.)Example 8-18 (Cont.)

1.1. The variable of interest is the form of distribution of defects in printed circuit The variable of interest is the form of distribution of defects in printed circuit boards.boards.

2.2. HH00: The form of distribution of defects is Poisson: The form of distribution of defects is Poisson

HH11: The form of distribution of defects is not Poisson: The form of distribution of defects is not Poisson

3.3. k = 3, p = 1, k-p-1 = 1 d.f.k = 3, p = 1, k-p-1 = 1 d.f.

4. 4. At At = 0.05, we reject H = 0.05, we reject H00 if X if X2200 > X > X22

0.05, 1 0.05, 1 = 3.84= 3.84

5.5. The test statistics is:The test statistics is:

6.6. Since XSince X220 0 = 2.94 < X= 2.94 < X22

0.05, 1 0.05, 1 = 3.84, we are unable to reject the null hypothesis th= 3.84, we are unable to reject the null hypothesis th

at the distribution of defects in printed circuit boards is Poisson.at the distribution of defects in printed circuit boards is Poisson.

94.244.10

)44.1013(

24.21

)24.2115(

32.28

)32.2832()( 222

1

220

k

i i

ii

E

EO


Contingency Table TestsContingency Table Tests Example 8-20Example 8-20

A company has to choose among three pension plans. Management wishes to A company has to choose among three pension plans. Management wishes to know whether the preference for plans is independent of job classification and know whether the preference for plans is independent of job classification and wants to use wants to use = 0.05. The opinions of a random sample of 500 employees = 0.05. The opinions of a random sample of 500 employees are shown in Table 8-4.are shown in Table 8-4.

(&8-8)


Contingency Table TestContingency Table Test- The Problem Formulation (I)- The Problem Formulation (I)

There are two classifications, one has r levels and the other has c There are two classifications, one has r levels and the other has c levels. (3 pension plans and 2 type of workers)levels. (3 pension plans and 2 type of workers)

Want to know whether two methods of classification are statistically Want to know whether two methods of classification are statistically independent. (whether the preference of pension plans is independent independent. (whether the preference of pension plans is independent of job classification)of job classification)

The table:The table:


Contingency Table TestContingency Table Test- The Problem Formulation (II)- The Problem Formulation (II)

Let pLet pijij be the probability that a random selected element falls in the ij be the probability that a random selected element falls in the ij thth

cell, given that the two classifications are independent. Then pcell, given that the two classifications are independent. Then p ijij = u = uiivvjj, ,

where the estimator for uwhere the estimator for uii and v and vjj are are

Therefore, the expected frequency of each cell isTherefore, the expected frequency of each cell is

Then, for large n, the statisticThen, for large n, the statistic

has an approximate chi-square distribution with (r-1)(c-1) d.f.has an approximate chi-square distribution with (r-1)(c-1) d.f.

r

iijj

c

jiji O

nvO

n 11

1

1

r

iij

c

jijjiij OO

nvnE

11

1

r

i

c

j ij

ijij

E

EO

1 1

220

)(


Example 8-20Example 8-20

summary table of influence procedures for a single sample (i)

Documents

distribution of defects

test statistics

sample statistics

hypothesized distribution

form of distribution

poisson distribution

statistics iigoodness

chisquare distribution