chapter 1 definition of basic concepts

Upload: stephen-kingscrown-favour-mathame

Post on 04-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 CHAPTER 1 Definition of Basic Concepts

    1/6

    - 1 -

    DEFINITIONS OF BASIC CONCEPTS OF SAMPLE SURVEYS1

    CONTENTS1. INTRODUCTION................................................................................................................................... 1

    2. POPULATION ....................................................................................................................................... 1

    3. TARGET POPULATION................................ ................. ........................ .................... ..................... ...... 24. SURVEY POPULATION ....................................................................................................................... 2

    5. CENSUSES ............................................................................................................................................ 3

    6. SAMPLING ............................................................................................................................................ 3

    7. SAMPLING UNIT ............... ..................... .................... ..................... ........................ .................... ......... 3

    8. SAMPLING FRAME ................. ........................ .................... ..................... .................... ..................... ... 3

    9. PARAMETER ........................................................................................................................................ 3

    10. STATISTIC ........................................................................................................................................ 3

    11. ESTIMATOR/ESTIMATE ................................................................................................................. 4

    12. UNBIASED ESTIMATOR ................ .................... ..................... ..................... .................... ............... 4

    13. ACCURACY ...................................................................................................................................... 414. PRECISION ....................................................................................................................................... 4

    15. EFFICIENCY or RELATIVE EFFICIENCY ................................. ........................ .................... ......... 4

    16. SAMPLING FRACTION AND WEIGHT ............................... ..................... .................... .................. 5

    17. STANDARD ERROR ........................................................................................................................ 5

    18. SAMPLING AND NON-SAMPLIN ERRORS .................. ..................... .................... ..................... ... 5

    19. COEFFICIENT OF VARIATION or RELATIVE ERROR......................... .................... ..................... 6

    20. MARGIN OF ERROR ........................................................................................................................ 6

    1. INTRODUCTIONIn this lecture the basic concepts and definitions of the sampling theory and estimation arediscussed. A thorough grasp of these ideas is necessary to facilitate a clear understanding ofthe survey methods considered in the subsequent lectures.

    2. POPULATIONThe population is the group of people or entities to which findings are to be generalized. The

    population must be defined explicitlybefore a sample is taken.

    This group is defined by whatever question is being asked i.e. the objective of the study.

    Example 1: Do UB students get stipend enrolled during 2011-12? How many populations are of interest?

    One What is the population of interest?

    All current UB students

    Example 2: Is the IQ of female students the same as the IQ of male students in UB?

    1Prepared by Dr. V. K. Dwivedi, Department of Statistics, UB for the STA 354 Course

  • 8/13/2019 CHAPTER 1 Definition of Basic Concepts

    2/6

    - 2 -

    How many populations are of interest? Two

    What are the populations of interest?

    All female students and all male students

    It is essential to define the population in terms of:

    Contentrefers to the definition of the type and characteristics of the elements whichcomprise the population; e.g. list of establishment, list of households,

    Extent refers to geographic boundaries as they relate to coverage; and e.g. list ofestablishments, list of Households

    2in Gaborone.

    Time would refer to the time period to which the population refers. e.g. list ofestablishments, list households in Gaborone in 2001 Census.

    Remark: It is to be noted that the termpopulationis used in statistical sense, i.e. to denote

    the aggregate of units to which the survey results are to apply. It need not, though of course it

    often does, refer to a population of human beings. An alternative term is universe.

    3. TARGET POPULATIONThe target population is the population (of elements of course), which we want to investigateby means of a sample survey (i.e. population which required to meet the survey objectives).

    For example; population 12 years and above, children below 5 years etc.

    4. SURVEY POPULATIONThe survey/study population is the populations actually covered by the survey, or better still,

    the population we have access to by means of the sampling frame. Ideally the target andsurvey population will be the same but for practical reasons they may not be identical. The

    survey population is usually a sub-set of the target population.

    Remark: When the target and survey populations are not identical then the results of thesample should be generalized to the survey population. Generalizing the sample results to the

    target population may only be done with some degree of caution.

    Examples1. Many national surveys in the Botswana would ideally exclude hospitals, hotels, prisons,

    army barracks and other institutions. However, the severe problems involved in collecting

    responses from such persons frequently lead to their exclusion from target population. The

    advantage of starting with the ideal target population is that the exclusions are explicitly

    identified, thus enabling the magnitude and consequences of the restrictions to be assessed.

    2. In HIES survey, one of the main objectives was to measure average income and

    expenditures of all households in Botswana (Target population) but the survey population

    comprised of those households residing in private dwellings. Thus in this instance

    2 Household: A household consists of one or more persons, related or unrelated, livingtogether "under the same roof" in the same lolwapa, eating together "from the same pot"and/or making common provision for food and other living arrangements.

  • 8/13/2019 CHAPTER 1 Definition of Basic Concepts

    3/6

    - 3 -

    generalizing the sample results to all households in Botswana should be done with somedegree of caution.

    5. CENSUSESCensuses are collections of data from every person or entity in the population.

    6. SAMPLINGSampling is the selection of a part (sample) from a population, observing some characteristicsof interest and then drawing some conclusions (inference) about the parent population.

    Statements based on samples being always probability statement, it is therefore important to

    know the underlying principles and the limitations of such results. The study of the methods

    of collecting and analyzing data through samples is termedsampling theory.

    In summary, the main objective of sampling is to estimate certain population parameters

    (mean, total, proportion) using statistics derived through sample.

    7. SAMPLING UNITThe sampling unit refers to any potential member of the sample at the appropriate stages of

    selection. It is important that the sampling units are clearly defined since their nature may

    affect the usefulness of different sampling methods.

    For example (i) in a single stage sampling on housing needs, houses may be used as

    sampling units, but in making such a choice it is important that a complete list (sampling

    frame) of houses from which to draw the sample exists. (ii) With two stage design PSU

    (Enumeration Area3) comprise SSU (Household) i.e. sampling unit depends on the selection

    stage.

    8. SAMPLING FRAMEA complete list of sampling units which represent the population to be covered is called the

    sampling frame.

    9. PARAMETERThe population parameter is the summary value of the characteristics (variables/attributes) of

    the population one is trying to estimate using the sample.

    In the HIES, for instance, we measure household income and thus the mean household

    income for all households in Botswana is a parameter.

    10.STATISTICAny function of sample values is calledstatistic.

    For example, the mean household income calculated from the sample is a statistic. In general

    we use statistic to estimate population parameter.

    3Enumeration Areas:An Enumeration Area (EA) is the smallest geographic unit, which represents an averageworkload for an enumerator over a specified period. The average size of an EA is approximately 120-150

    malwapa. An EA may be a whole locality (this is the case of a small village which is an EA by itself), a part of alocality (this is the case of a bigger village which has been divided into more than one EA) or a group of localities

    (this is the case of cattle posts, lands areas or freehold farms).

  • 8/13/2019 CHAPTER 1 Definition of Basic Concepts

    4/6

    - 4 -

    11.ESTIMATOR/ESTIMATEThe estimator is the method (e.g. sample mean) of estimating the population parameter. An

    estimator is a random variate and may take different values from sample to sample.

    The value of the estimator obtained from any particular sample is called the estimate.

    12.UNBIASED ESTIMATORAn estimator is said to be unbiased if for example the mean of the estimate derived from all

    possible samples equal the population parameter i.e. population mean.

    13.ACCURACYThe accuracy of a sample estimate refers to its closeness to the correct population value

    (parameter) i.e. the size of the deviation from the true mean m .

    0)( =-ym accurate

    0)( -ym inaccurate

    Remark: But since the population value is not usually known, the accuracy of a sample

    estimate can not usually be assessed. For this reason we usually talk ofprecisionof the

    estimate.

    14.PRECISIONThe precision of an estimate refers to the probable accuracy of the estimate. The probable

    accuracy is measured by the standard error of the estimate. All other things being equal with

    a lower variation is more precise than one with greater variation.

    )(

    1Pr

    estimatevestimateofecision

    15.EFFICIENCY or RELATIVE EFFICIENCYIf for a given sample size one unbiased estimator has a lower variation than another, we say it

    is more efficient. When comparing the efficiency of sample designs for fixed sample sizesand outlay of resources, we compare the variances of the estimator. The more efficient of thetwo is the one with lower variance.

    Define

    =1V Variance of the complex design nS /2

    1=

    =2V Variance of the SRS for the same sample size nS /2

    2=

    Thus, efficiency of complex design with respect to SRS of the same size is

    2

    1

    2

    2

    2

    1

    2

    2

    1

    2

    /

    /

    Pr

    Pr)(

    S

    S

    nS

    nS

    V

    V

    SRSofecision

    designcomplexofecisionEEfficiency ====

    There would be three cases:

    (i) if 1=E means both the designs are equally efficient.(ii) if 1E means complex design is more efficient than SRS.

    In general efficiency is presented in percentage.

    Percent gain in efficiency of complex design over SRS is

  • 8/13/2019 CHAPTER 1 Definition of Basic Concepts

    5/6

    - 5 -

    2

    1

    ( ) 1 100V

    Percent gainin Efficiency GE xV

    = -

    16.SAMPLING FRACTION AND WEIGHTThe ratio of the size of the sample (n) to that of the population (N) is called the sampling

    fraction and is denoted by the letter f that is

    N

    nf = . The inverse of this quantity, that is

    n

    N

    fw ==

    1sometimes called the expansionor raisingor weightingfactor. It is the factor by

    which the sample results are expanded or raised to derive estimates of population total. One

    other use of the sampling fraction is in the finite population correction symbolized by ).1( f-

    17.STANDARD ERRORStandard error is the measure of variability between all possible samples. It is the square root

    of the variance of the mean squared deviation around the mean.

    )()( yVarySE = . Standard error plays numerous roles in sampling theory viz. measuring

    the sampling error, confidence interval, sample size, etc.

    18.SAMPLING AND NON-SAMPLIN ERRORSThe errors involved in collection, processing and analysis of the data in a survey may beclassified as: (i) Sampling error, and (ii) Non-sampling error

    SAMPLING ERROR

    The error which arises due to only a sample being used to estimate the population parameter

    is termed sampling error or sampling fluctuation. Whatever the degree of cautiousness in

    selecting a sample, there will always be a difference the population value (parameter) and its

    corresponding estimate.

    It is evaluated statistically. It is measured in terms standard error for a particular statistics

    (mean, total, proportion etc.)

    This error can be reduced by increasing the size of the sample. In fact the decrease in

    sampling error is inversely proportional to the square root of the sample size.

    1Sampling error

    sample size

    The relationship can be examined graphically as shown below.

    Sampling error

    Sample size

  • 8/13/2019 CHAPTER 1 Definition of Basic Concepts

    6/6

    - 6 -

    Remark: When sample survey becomes a census (complete enumeration), the sampling errorbecomes zero.

    NON-SAMPLING ERROR

    Besides sampling error, the sample estimate may be subject to other errors, grouped together,

    are termed non-sampling error. The main sources of non-sampling errors are:

    i. Failure to measure some of the units in the selected sample;ii. Observational errors

    iii. Errors introduced in editing, coding tabulating the results.

    Remark: The non-sampling error is likely to increase with increase in sample size, while

    sampling error decreases with increase in sample size.

    19.COEFFICIENT OF VARIATION or RELATIVE ERRORThe coefficient of variation is defined as 100 times the coefficient of dispersion based upon

    standard deviation is called coefficient of variation (CV),

    yxCV

    s100= , i.e. the CV is the percentage variation in mean, while standard deviation

    being considered as the total variation in the mean. It is a good statistic for comparing the

    variability of two series. The series having greater CV is said to be more variable than the

    other.

    It is also of interest to note that the population coefficient of variation (CV) is usually fairly

    stable overtime and over characteristics of similar nature. This stability of CV makes it

    possible to determine the sample size for estimating a parameter with a specified margin of

    error. This indicates one of the many importances of coefficient of variation.

    20.MARGIN OF ERRORThe margin of error is a common summary of sampling error, which quantifies about the

    uncertainty about a survey result. The margin of error can be interpreted by making use ofideas from the laws of probability.

    Example: A researcher wishes to estimate the percentage of people belonging to blood group

    Oin a particular region and want to know the size of the sample required to conduct a small

    sample survey using SRS. The next question comes that how accurately the researcher wishes

    to know the percentage of people with blood group O. The researcher will be content if the

    percentage is correct within 5% (margin of error) in the sense that if the sample shows45% to have blood group O, the percentage for the region is to sure lie between 40 and 50.