sampling part2

13
Sampling Theory F-test: (Test for equality of Population variances using F- distribution) Suppose we want to test whether two independent samples x 1 ,x 2 ,……… .. x n 1 and y 1 ,y 2 ,……… .. y n 2 have been drawn from the normal population with the same variance σ 2 . Let F= ( i=1 n 1 ( x i x ) 2 n 1 1 ) ( i=1 n 2 ( y i y ) 2 n 2 1 ) = ( n 1 S 1 2 n 1 1 ) ( n 2 S 2 2 n 2 1 ) { ∵S 1 2 = i=1 n 1 ( x i x ) 2 n 1 S 2 2 = i=1 n 2 ( y i y ) 2 n 2 } F= u 2 v 2 Where u 2 = n 1 S 1 2 n 1 1 and v 2 = n 2 S 2 2 n 2 1 The test statistic F defined above follows F-distribution with ( n 1 1 ,n 2 1 ) degrees of freedom considering the greater of u 2 and v 2 always in the numerator of F. Note: Table value for ( n 1 1 ,n 2 1 ) degrees of freedom is different from the value for ( n 2 1 ,n 1 1 ) degrees of freedom. Working procedure 1.Set the null hypothesis H 0 : σ x 2 =σ y 2 =σ 2 2.Test statistic F= ( i=1 n 1 ( x i x ) 2 n 1 1 ) ( i=1 n 2 ( y i y ) 2 n 2 1 ) = ( n 1 S 1 2 n 1 1 ) ( n 2 S 2 2 n 2 1 ) 3.Compare the calculated value of F for the two given samples with the tabulated value of F for ( n 1 1 ,n 2 1 ) degrees of freedom at any required level of significance. Problems 1. Test the equality of standard deviations for the data given below at 5% level of significance: n 1 =10 ;n 2 =14 ;S 1 =1.5 ;S 2 =1.2 Soln: H 0 : σ x 2 =σ y 2 =σ 2 F= ( n 1 S 1 2 n 1 1 ) ( n 2 S 2 2 n 2 1 ) = 2.5 1.55 =1.61=F cal ( F cal - calculated value of F ) The table value for ( n 1 1 ,n 2 1 ) =( 9,13 ) degrees of freedom at 5% level of significance is F 0.05 =2.71. Page 15

Upload: nikhilsingh

Post on 15-Sep-2015

219 views

Category:

Documents


3 download

DESCRIPTION

Sampling theory in maths (notes)

TRANSCRIPT

Sampling Theory

F-test:(Test for equality of Population variances using F-distribution) Suppose we want to test whether two independent samples and have been drawn from the normal population with the same variance .Let Where and The test statistic defined above follows F-distribution with degrees of freedom considering the greater of and always in the numerator of .Note:Table value for degrees of freedom is different from the value for degrees of freedom.Working procedure1. Set the null hypothesis 2. Test statistic 3. Compare the calculated value of for the two given samples with the tabulated value of for degrees of freedom at any required level of significance.Problems1. Test the equality of standard deviations for the data given below at 5% level of significance: Soln:

( - calculated value of )The table value for degrees of freedom at 5% level of significance is .( - tabulated value of for 5% level of significance)As , is accepted.Therefore, the difference is not significant.Note:The numerator value (2.5) is more than the denominator value (1.55) and hence was computed for degrees of freedom, otherwise, we have to take test statistic as and then has to be determined for degrees of freedom.

2. In one sample of 8 observations the sum of the squares of deviations of the sample values from the sample mean was 84.4 and in another sample of 10 observations, it was 102.6. Test whether the difference in variance is significant at 5% level using F-test.Soln: Given Under the null hypothesis The table value for (7,9) degrees of freedom (d.f) at 5% level of significance is As , is accepted.Therefore, the difference is not significant at 5% level of significance.

3. Two random samples drawn from 2 normal populations are given below. Test whether the 2 populations have the same variance.Sample A28303233312934

Sample B293030242728-

Examine whether the samples have been drawn from normal population having the same variance. Soln: Given Similarly, If we take then, Numerator value is less than the denominator and hence we have to take Table value of for for degrees of freedom is .

As , is accepted.

Hence the two samples could have been drawn from the population having the same variance.4. The daily wages in rupees of skilled workers in two cities are as follows.CitySize of sample of workersS.D of wages in the samples

City A1625

City B1332

Test at 5% level the equality of variances of the wage distribution in the two cities.Soln: Given We have to take as

Table value for degrees of freedom is .

As , is accepted.

Similar problems for practice1. For two samples of sizes 8 and 12 the observed variances are 0.064 and 0.024. Test the hypothesis that the samples came from normal populations with variances equal.

2. In a sample of 8 observations the sum of the squared deviations of items from the mean was 94.5. In another sample of 10 observations the value was found to be 101.7. Test whether the difference is significant.

3. Two random samples drawn from two normal populations areA636568697172----

B63626566696970717273

Test whether the two populations have the same variance.

Examples on Fitting theoretical distribution to a given collection of observed data:

Example (1): Suppose five unbiased coins are tossed and numbers of heads are noted. The experiment is repeated 64 times and the following distribution is obtained.No. of heads012345Total

Frequencies36242641N=64

Let us try to fit a binomial distribution to this data.

Here As coins are unbiased, we have Expected frequencies64 Observed frequencies

03

16

224

326

44

51

Total16464

Example (2): Let us fit a Poisson distribution to the following data.01234Total

123591431200

Mean:

Expected frequencies200 Observed frequencies

0121.3123

160.6659

215.1614

32.523

40.541

Total1NN=200

-test to test the goodness of fit: Let be the observed frequencies and be the corresponding expected frequencies such that where is the number of members in the population.Suppose we intend to test the null hypothesis-: The theoretical frequency distribution is a good fit to the observed frequency distributionAgainst the alternative hypothesis: The theoretical frequency distribution is not a good fit to the observed frequency distribution.To test against , Chi-square test of goodness of fit is applied.Here, the test statistic is

Under this is a chi-square variate with degrees of freedom where is the number of terms in the (after pooling the frequencies which are less than 5 with the adjacent ones-Refer example) and c is the number of constraints. The theoretical frequencies are computed such that . This is one constraint. Apart from this, if any parameter is estimated from the oserved distribution, every such estimation would be a constraint. Thus, the value of would be one more than the number of parameters estimated from the observed distribution.Note (1): -test is one tailed (Right tailed).i.e. if then is accepted otherwise is rejected.Note (2): The chi-square test of goodness of fit is applicable subject to the following conditions.1. The observations should be independent.2. The total frequency N should be large.3. The theoretical frequencies should be 5 or more. If any is less than 5, it should be pooled with the adjacent frequency.4. If any parameter is estimated from the observed distribution, corresponding to every such estimation, one degree of freedom should be lessened.Problems1. The following data relates to the number of mistakes in each page of a book containing 180 pages.No. of mistakes per page012345 or moreTotal

No. of pages1303215210180

Test whether the Poisson distribution is a good fit to this observed distribution.Soln: : Poisson distribution is a good fit to the observed distributionThe alternative hypothesis is: Poisson distribution is not a good fit to the observed distributionTo test , we fit a Poisson distribution to the data.

Here the parameter is estimated by finding the mean from the observed distribution.

180.

00.6722121130

10.26664832

20.05551015

30.0055111182

4001

5000

Tot1180180

Here, the last three theoretical (expected) frequencies are less than 5. Therefore, they are pooled with the adjacent ones such that, finally all the frequencies are 5 or more.After pooling, we have

130121810.6694

32482565.3333

1811494.4545

Total10.4572

The test statistic is

(after pooling) 2 (1 for which is common in all cases and 1 for estimating the parameter from the observed distribution).

Thus, is a chi-square variate with degrees of freedom.

Now, from chi-square distribution table, the value of at 5% level of significance is

As, , is rejected.

Conclusion: Poisson distribution is not a good fit to the observed distribution.2. To an observed frequency distribution, binomial distribution is fitted after estimating from the observed data. The observed and theoretical frequencies are given below01234567Tot

33173128111296

17192724134196

Test whether binomial distribution is a good fit.Soln:

: Binomial distribution is a good fit.

0063081

137

21719

33127

42824

51113

6031054

721

Tot9696

The frequencies are pooled in such a way that none of the theoretical frequencies is less than 5. However, observed frequencies may be less than 5.

After pooling, we have

6840.5

171940.2105

3127160.5926

2824160.6667

111340.3077

3540.2

Total2.4775

The test statistic is

(after pooling) 2 (1 for which is common in all cases and 1 for estimating the parameter from the observed distribution).

Thus, is a chi-square variate with degrees of freedom.

Now, from chi-square distribution table, the value of at 5% level of significance is

As, , is accepted.

Conclusion: Binomial distribution is a good fit.

3. 10000 digits are randomly chosen from a telephone directory and the following data is obtained.DigitFrequency

0926

11207

21097

31066

41275

5833

61007

7872

8864

9853

Total10000

Test whether there is equi-distribution in the telephone director at 1% level of significance.

Soln:

: The digits are equi-distributed in the telephone directory.The expected (theoretical) frequencies corresponding to each of the digits should be equal.i.e.

926100054765.476

120710004284942.849

1097100094099.409

1066100043564.356

127510007562575.625

83310002788927.889

10071000490.049

87210001638416.384

86410001849618.496

85310002160921.609

Total222.142

The test statistic is

1(1 for ).

Thus, is a chi-square variate with degrees of freedom.

Now, from chi-square distribution table, the value of at 1% level of significance is

As, , is rejected.

Conclusion: In the telephone dictionary, the digits are not equi-distributed.

4. According to a theory in Genetics, the proportion of beans of four types A, B, C and D in a generation should be 9:3:3:1. In an experiment, among 1600 beans, the frequency of beans of each of the above four types were 882, 313, 287 and 118 respectively. Does the result support the theory?

Soln:

: The result of the experiment supports the theory.

Under , the expected frequencies should be in the ratio 9:3:3:1.

8829003240.36

3133001690.56

2873001690.56

1181003243.24

Total4.72

The test statistic is

1(1 for ).

Thus, is a chi-square variate with degrees of freedom.

Now, from chi-square distribution table, the value of at 5% level of significance is

As, , is accepted.

Conclusion: The result of the experiment supports theory.

5. In order to test whether a die is biased, it is thrown 72 times and the results are tabulated as follows:Result of throw123456Tot

Number of throws814159131372

What is your conclusion?

Soln:: The die is unbiased.Under , all the sides of the die are equiprobable. Therefore, their frequencies should be equal.So, the theoretical frequencies are

1812161.3333

2141240.3333

3151290.75

491290.75

5131210.0833

6131210.0833

Total3.33

The test statistic is

1(1 for ).

Thus, is a chi-square variate with degrees of freedom.

Now, from chi-square distribution table, the value of at 5% level of significance is

As, , is accepted.

Conclusion: The die is unbiased.

6. A survey of 64 families with 3 children each is conducted and the number of male children in each family is noted. The result are tabulated as follows:Male children0123Total

Families619291064

Apply chi-square test of goodness of fit to test whether male and female children are equiprobable.

Soln:

: Male and female children are equiprobable. (Probability of male child is 0.5)

Under , to the given data, binomial distribution can be fitted. (m=3 and p=0.5)

Expected frequencies64

08

124

224

38

Total164

6840.5

1924251.042

2924251.042

10840.5

Total3.084

The test statistic is

1(1 for . Note that neither m nor p is estimated).

Thus, is a chi-square variate with degrees of freedom.

Now, from chi-square distribution table, the value of at 5% level of significance is

As, , is accepted.

Conclusion: Male and female children are equiprobable.

Similar problems for practice1. Among 64 offsprings of a certain cross between Guinea pigs 34 were red, 10 were black and 20 were white. According to the genetic model these numbers should be in the ratio 9:3:4. Are the data consistent with the model at 5% level?

Hint: : Data are consistent. ; ; at d.f As, , is accepted.

2. The following table gives the number of train accidents in a country that occurred during the various days of the week. Find whether the accidents are uniformly distributed over the week.Hint: : Accidents are uniformly .If the accidents are to be uniformly distributed it is expected that accidents happen per day. for all the days of the week. at d.f As, , is accepted.

3. Five coins are tossed 320 times. The number of heads observed is given below. Examine whether the coin is unbiased.

Hint: : The coin is unbiased (p=1/2) . ; ;

at d.fAs, , is rejected.

4. A survey of 320 families with 5 children each revealed the following information.No. of boys543210

No. of girls012345

No. of families1456110884012

Is the result consistent with the hypothesis that male and female births are equally probable?

Hint: : Male and female births are equiprobable. (Probability of male child is 0.5) ; ; No. of male births

01210

14050

288100

3110100

45650

51410

at d.fAs, , is accepted.

5. Fit a poisson distribution for the following data and test the goodness of fit.x0123456Tot

f27370307721390

Hint: : Poisson distribution is a good fit.

0273236.4

1705.9 after adding 0.3 so that total becomes 390118.2

23029.5

374.9

41770.6

520.1

610

total390389.7

After pooling, we have

273236.4

70118.2

3029.5

175.9

at d.f 2 (1 for which is common in all cases and 1 for estimating the parameter from the observed distribution).

As, , is rejected.

Page 22