lecture notes in statistics - home - springer978-1-4612-3140-0/1.pdf · lecture notes in statistics...
TRANSCRIPT
Lecture Notes in Statistics
Edited by J. Berger, S. Fienberg, J. Gani, K. Krickeberg, I. Olkin, and B. Singer
66
Tommy Wright
Exact Confidence Bounds when Sampling from Small Finite Universes An Easy Reference Based on the Hypergeometric Distribution
Springer-Verlag Berlin Heidelberg New York London Paris
Tokyo Hong Kong Barcelona Budapest
Author
Tommy Wright Mathematical Sciences Section Oak Ridge National Laboratory Oak Ridge, TN 37831-6367, USA
Mathematical Subject Classification: 62D05, 62Q05, 60C05
ISBN-13: 978-0-387-97515-3
001: 10.1007/978-1-4612-3140-0
e-ISBN-13: 978-1-4612-3140-0
This work is subjectto copyright. All rights are reserved, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. Duplication of this publication or parts thereof is only permitted under the provisions of the German Copyright Law of September 9, 1965, in its current version, and a copyright fee must always be paid. Violations fall under the prosecution act of the German Copyright Law.
Cl Springer-Verlag Berlin Heidelberg 1991
Softcover reprint of the hardcover 1st edition 1991
2847/3140-543210 - Printed on acid-free paper
What is more beautiful than a simple and important
question with a simple and exact answer that is easy
to provide?
To Marsha, Taunya, Tommy II, and Tracy
PREFACE
There is a very simple and fundamental concept· to much of probability and statistics that
can be conveyed using the following problem.
PROBLEM. Assume a finite set (universe) of N units where A of the
units have a particular attribute. The value of N is known while the
value of A is unknown. If a proper subset (sample) of size n is
selected randomly and a of the units in the subset are observed to have
the particular attribute, what can be said about the unknown value of
A?
The problem is not new and almost anyone can describe several situations where a particular
problem could be presented in this setting. Some recent references with different focuses
include Cochran (1977); Williams (1978); Hajek (1981); Stuart (1984); Cassel, Samdal, and
Wretman (1977); and Johnson and Kotz (1977). We focus on confidence interval estimation of A. Several methods for exact confidence interval estimation of A exist (Buonaccorsi, 1987,
and Peskun, 1990), and this volume presents the theory and an extensive Table for one of
them.
One of the important contributions in Neyman (1934) is a discussion of the meaning of
confidence interval estimation and its relationship with hypothesis testing which we will call
the Neyman Approach. In Chapter 3 and following Neyman's Approach for simple random
sampling (without replacement), we present an elementary development of exact confidence interval estimation of A as a response to the specific problem cited above. Buonaccorsi (1987)
notes that the exact methods under simple random sampling of Konijn (1973, p. 79) and Katz
(1953) appear to be the same as the. result obtained from the Neyman Approach which Buonaccorsi refers to as the T Method. Because the Neyman Approach in our case for
one-sided confidence bounds of A is based on inverting a family of uniformly most powerful
one-sided tests of hypotheses for A, the resulting one-sided confidence bounds (upper and
lower) are uniformly most accurate as noted by Buonaccorsi (1987). That is, the Neyman
Approach leads to the shortest one-sided confidence intervals for the stated confidence levels.
Under simple random sampling, exact confidence intervals for A can be constructed using
the hypergeometric probability distribution (Chapter 3). Ch!Jflg and Delury (1950) provide
charts for two-sided confidence limits based on the hypergeometric distribution for N ;: 500;
2500; and 10000. Buonaccorsi (1987) notes that their method is similar to a method described
by Sukhatme, Sukhatme, Sukhatme, and Asok (1984, p. 46) and that it does not always lead to
a solution-particularly for small values of N .
Perhaps the most familiar method for coJnstructing exact confidence intervals of A is given
in Cochran (1977, p. 57). However, the method presented in Cochran's book is not the same
VIII
as that of the Neyman Approach. In fact, Buonaccorsi shows for most observed values of a that Cochran's method leads to longer confidence intervals for A than those obtained by the
Neyman Approach. The difference in length for one-sided intervals does not exceed 1.
To obtain an exact confidence bound for A under simple random sampling requires
extensive computing of hypergeometric probabilities. For this reason, approximations of
confidence bounds (for example, based on the binomial, Poisson, or normal distributions) are
frequently used (Cochran, 1977). For certain combinations of N , n, a, and confidence level,
these approximations are not good and can lead to incorrect inferences about the value of A .
The computer made it possible for the publication of a table of the hyper geometric
probability distribution by Liebermann and Owen (1961) for N varying from 1 through 100;
N = 1000; and N = 2000. Although the table is extensive, it can be difficult, and in some cases
impossible, to obtain exact confidence bounds for A •
Tomsky, Nakano, and Iwashika (1979) give a table of upper confidence bounds using the
Neyman Approach for N = 2, 3, 6, 7, 8,9, 10,20,30,40,50, and 100. Odeh and Owen (1983)
give a table of upper and lower confidence bounds using the method of Cochran (1977) for
N = 400; 600; 800; 1000; 1200; 1400; 1600; 1800; and 2000.
Currently, some statistical packages contain functions that generate hypergeometric
probabilities which can then be used to generate exact confidence bounds for A under simple
random sampling. For example, Alexander (1986) presents an interactive macro using the
Statistical Analysis System (SAS) function PROBHYPR to produce upper confidence bounds
for A using the Neyman Approach.
In spite of these recent computing developments, the existence of theory, and the ability to
produce exact confidence bounds for A, exact results are rarely given in practice. Why
continue to use and teach approximations, including ones that yield bad results for certain
cases, for such a common and simple problem when exact and simple methods can be used?
The purpose of this volume is to provide a complete and elementary development of the
details behind these confidence bounds and to provide an extensive Table (see
Application I, p. vii) of optimal upper and lower confidence bounds for A that is easy to
understand and use. It is primarily intended to be a quick and easy reference for a large group
of users including consulting and research statisticians, practitioners involved in acceptance
sampling type applications, scientists, auditors, engineers, quality control and quality
assurance personnel~pecially those engaged in manufacturing settings, government
officials---especially those involved in the collection of data from institutions or
establishments at local, state, and federal levels, managers, social scientists, education
IX
administrators, environment-wildlife-forestry related workers, marketing agencies, health
administrators, economists, personnel managers, etc. Indeed, anyone who has reason to select
a sample from a finite universe and construct a confidence interval or test a hypothesis will
find great use for this volume. Also as mentioned earlier, this volume is instructive and can be
a valuable supplement to courses in sampling techniques and methodology which tend to
devote little or no time to exact methods when sampling from finite universes. In addition to
the elementary development given in Chapter 3, on pages vi-viii, eight specific applications
of the Table of confidence bounds are listed, including tests of hypotheses, guidance for
detennination of sample size n, construction of conservative. confidence bounds under
stratified random sampling, and conservative comparisons of two separate universes. These
applications and the use of the Table are discussed with examples in Chapter 2 and can be
used without reference to the theory and development in Chapter 3. The Table is given in
Chapter 4.
The Table in this volume was produced on an IBM 3033 computer. I am grateful that
pennission was granted to use, in the computer program, the function PROBHYPR which is
part of the SAS® System, a product from SAS Institute Inc., Cary, North Carolina.
I am also grateful to the Naval Facilities Engineering Command, Department of the Navy,
U.S. Department of Defense for initial funding on a project related to the sampling of housing
units at Navy Installations around the world which led to the beginning of this work.
Additional support to complete the work came from the Applied Mathematical Sciences
Program in the Office of Energy Research of the U.S. Department of Energy, under contract
number DE-AC05-840R21400 with Martin Marietta Energy Systems, Inc. to operate Oak
Ridge National Laboratory.
My sincere thanks to the following individuals for independent reviews, encouragement,
and for helpful suggestions: John Beauchamp, Kimiko O. Bowman, John P. Buonaccorsi, and
How J. Tsao. Kimiko Bowman and How Tsao each produced separate and independent
computer programs which confinn the computational results given in Chapter 4. It was a
personal joy that I was able to excite one of my students, Paula Baker, at Knoxville College
about statistics by having her proofread an early draft.
Finally, the production of this work would have beenirnpossible without the valuable
assistance of three other members of the Mathematical Sciences Section at Oak Ridge
National Laboratory: Rhonda Harbison and Tammy Darland for typing and retyping the many
drafts of this volume, excluding Chapter 4, and Elmon Leach for the programming that
produced the extensive Table in Chapter 4. I am indeed grateful for their expert support and
patience.
Tommy Wright
A NOTE TO USERS
What is the purpose of this volume?
This volume is of particuJar interest to anyone who studies solutions to or faces problems
of the following type.
Setting and Problem. Assume a finite universe (population, lot, or urn) of
N units of which an unknown number A (or unknown proportion P =AIN) has a particuJar attribute or characteristic. If a sample of size n is selected from the
entire universe and a of the sample units are observed to have the particuJar attribute or characteristic, what can be said about the value of A (or P)?
If the sample is a simple random sample, then this work can be used to
easily find exact ont>-sided and tw<rsided confidence bounds for A (or P) for
small values of N. The extent of the Table is indicated under Application I of
question 2 below. Exact tests of hypotheses and sample size determination for
estimation under simple random sampling can also be facilitated using this
volume. Conservative confidence intervals under stratified random sampling
can also be obtained.
Indeed major objectives of this volume are to be instructive and to provide an easy to use
reference. In order to increase usefulness, allow for flexibility, decrease the chance of the need for approximations, and provide exact results, a table that is responsive to the above setting and problem must be extensive to accommodate the many possible combinations of Nt
n, a and confidence levels that are most likely to be encountered particularly with small
universes. An attempt has been made to provide exact bounds under simple random sampling for those cases where the approximations are generally not good, i.e., for small values of N,
for small values of n relative to N , and for small and Jarge values of a relative to n .
What specific problems can be solved using this volume?
Eight possible applications of the Table in this volume are listed. Each application is discussed in Chapter 2 with examples.
Application I. Exact 100(1 - a)% on<>-sided lower and upper confidence bounds for A
under simple random sampling can be found easily in the Table where 1 - a is either .975 or .95 for the following combinations of N , n, and a where
N = the number of units in the finite universe. n = the number of units in the simple random sample, and a = the number of units in the sample with a particular attribute
or characteristic.
XII
The Table in Chapter 4 has six sections.
lab,le ectIon N n a (3) a (4) Pages
4.1 2(1)50(1) 1(1) ~ 0(1) ~ n "2(1)n 58 to 76
4.2 52(2)100(2) 1(1)~ 0(1) ~ n "2(1)n 77 to 116
4.3 105(5)200 1(1)~ 0(1) ; n "2(I)n 117 to 190
4.4 210(10)500 1(1)~ 0(1) ; n "2(I)n 191 to 339
4.5 600(100)1000 1(1)60 0(1) ; n "2(I)n 340 to 378
62(2)~ 0(1) ~ 4 -Sn(l)n
4.6 1100(100)2000 2(2) ~ 0(1)!!. 4 379 to 426 -Sn(l)n
5
(1 )2( 1 )50 means that N varies from 2 to 50 in steps of 1.
(2)52(2)100 means thatN varies from 52 to 100 in steps of2.
(3)Displayed in Table.
(4)From Table by subtraction using (2.5) and (2.6).
Actually, the 100(1 - a)% one-sided lower and upper confidence bounds that are provided are
the best in the sense that they give the shortest possible intervals for the given confidence level
1- a.
Application II. Exact 100(1- a)% two-sided confidence bounds for A under simple
random sampling can be found easily for given N, n, and a by using appropriate lower and
upper one-sided confidence bounds. For example, 95% two-sided confidence bounds can be
obtained using the 97.5% lower confidence bound with the 97.5% upper confidence bound for
the given N , n , and a.
Application III. Conservative but useful confidence bounds for A when N is not in this
volume but is between two values of N that are in this volume can be obtained easily. Similar
results can be obtained'when a particular n is not in this volume but is between two values of
n that are.
Application W. Exact one- and two-tailed a level tests under simple random sampling of
the hypotheses
can be performed easily, for various values of a including a= .025, .05, .1, etc.
XIII
Application V. This volume can be used to determine the sample size n needed to estimate
A under simple random sampling without appealing to assumptions of normality (or some
other approximation) for any statistic.
Application VI. The analogous exact inferences and procedures noted in Applications I, II,
III, IV, and V for A can also be performed for P the universe (population) proportion under
simple random sampling.
Application VII. Conservative confidence bounds (both one- and two-sided) of A (or P)
for certain values of 1 - a can be obtained under stratified random sampling with four or less
strata. Hence, this volume can be used for much larger universe sizes with the use of
stratification as long as the number of units in each stratum does not exceed 2000.
Application VIII. Conservative confidence bounds for the difference A' - A" (or P' - P '')
can also be provided when comparing two different universes.
What is meant by "exact" when sampling from finite universes?
Under simple random sampling without replacement from a finite universe, the
hypergeometric probability distribution is an appropriate distribution on which to base
statistical inferences. Because the hypergeometric probability distribution is discrete, it will be
tare that the confidence level 1 - a will be exactly equal to the actual coverage probabilities,
for example, as illustrated in Table 3.5 of Example 3.8 on page 51. However, the actual
coverage probabilities for the results in this volume will always be at least that of the stated
confidence level 1 - a, and the coverage probabilities will be as close to the stated confidence
level as possible using the hypergeometric distribution. Thus, when the phrases "exact
confidence bounds" or "exact confidence intervals" are used, they are referring to the use of
the hypergeometric distribution under simple random sampling instead of an approximation of
the hypergeometric distribution and to the fact that the actual coverage probability for our
confidence statement will always be at least the stated confidence level and that the excess
probability will be as small as possible.
Finally, the user should note the following point.
• While Applications V. VII, and VIII are theoretically correct, it is not clear that one cannot
provide better results for the finite population setting. Research is underway in search of
better results, and it is expected that others may provide better answers in the future
through the sampling theory literature.
TABLE OF CONlENTS
PREFACE ........................................................................................................................... vii
A NOTE TO USERS .......................................................................................................... xi What is the purpose of this volume? ..................................... ............................. .......... xi What specific problems can be solved using this volume? ............................... ........... xi What is meant by "exact" when sampling from finite universes? ............................. xiii
1. INTR.ODUCTION ........................................................................................................ 1
2. TIIE APPLICATIONS ................................................................................................. 4
2.1. Application I. Exact 100(1- a)% One-Sided Upper and Lower Confidence Bounds for A Under Simple Random Sampling ............... 4
Application 1.1. 100(1- a)% Upper Confidence Bound for A........................... 4 Application 1.2. 100(1 - a)% Lower Confidence Bound for A........................... 5 Application 1.3. When a Particular Value of a Is Not in the Table ............. ........ 6
2.2. Application II. Exact 100(1 - a)% Two-Sided Confidence Bounds for A Under Simple Random Sampling .............................................. 8
.2.3. Application III. Conservative Confidence Bounds for A Under Simple Random Sampling when No Is Not in the Table, but No Is Between Two Other Values of N That Are .................................................................... 9
Application III.1. When a Particular Value no is Not in the Table ..................... 11
2.4. Application IV. Exact One- and Two-Sided a Level Tests of Hypotheses Under Simple Random Sampling. ................................................ 13
Application IV. 1. To TestHo: A =AoAgainstH,,: A ~Ao . ............................. 14 Application IV.2. To TestHo: A ~AoAgainstH,,: A >Ao. ............................. 14 Application IV.3. To TestHo: A ~AoAgainstH,,: A <Ao. ............................. 15
2.5. Application V. Sample Size Determination Under Simple Random Sampling 16
2.6. Application VI. The Analogous Exact Inferences and Procedures of Applications I, II, III, IV, and V Can All Be Performed for P , the Universe (population) Proportion, Under Simple Random Sampling ........ 19
2.7. Application VII. Conservative Confidence Bounds Under Stratified Random Sampling with Four or Less Strata ....................................... 20
2.8. Application VIII. Conservative Comparison of Two Universes ................. ,....... 23
3. TIIE DEVELOPMENT AND THEORy..................................................................... 26
3.1. Exact Hypothesis Testing for a Finite Universe .................................................. 26
3.2. Exact Confidence Interval Estimation for a Finite Universe ............................... 38
3.3. Some Additional Results On One-Sided Confidence Bounds ............................. 53
XVI
4. THE TABLE OF LOWER AND UPPER CONFIDENCE BOUNDS
4.1. N = 2(1)50 ............................................................................................................ 58
4.2. N = 52(2)100 ........................................................................................................ 77
4.3. N = 105(5)200 ...................................................................................................... 117
4.4. N = 210(10)500 .................................................................................................... 191
" 4.5. N = 600(100)1000 ................................................................................................ 340
4.6. N = 1100(100)2000 .............................................................................................. 379
APPENDIX. A SAS MACRO FOR GENERATING EXACT ONE-SIDED LOWER AND UPPER CONFIDENCE BOUNDS FORA FOR STATEDN, n, a, and 1- a .......................................................................... 427
REFERENCES ................................................................................................................... 429
IN'DEX ................................................................................................................................ 431