sampling of data

Sampling

Sampling is a part of our day-to-day life which we use inadvertently

The purpose of sampling is to gather maximum information about the population under consideration at minimum cost, time and human power.

This is best achieved when the sample posses all the characteristics of the population

Objectives

i. To make an inference about an unknown

parameter of a population from a sample drawn

from it.

ii. To test a hypothesis relating to a population

parameter.

Importance of Sampling A house wife takes one or two grains of rice from

the cooking pan and decides whether the rice is cooked or not.

If the house wife takes the entire rice to test, from the cooking pan then there will be no rice to eat.

A quality controller tests few items and decide whether the lot is in accordance with the desired specifications or not.

If he tests all the items produced in the lot then there will not be any items remained in the lot to reach the customer.

Importance of Sampling

A pathologist takes a few drops of blood and

tests for any change in the content.

If he squeeze the entire blood from the body

then ultimately the patient will die, and there

will be no scope for further treatment.

All these situations emphasize the importance of sampling and reveals that the sampling is inevitable, and gives satisfactory results when:

the population is infinite

survey area is wide

the results are required in a short time

scared resources in respect of money and skilled personnel

Advantages of Sampling over Complete Enumeration (Census)

Less time

Reduced cost

Greater accuracy

Greater scope.

Sampling is inevitable when:

Population is too large

Testing is destructive

Population is hypothetical

Limitations of Sampling

The sampling units are drawn in a scientific manner

Appropriate sampling technique is used, and;

The sample size is adequate

Sampling gives best results only if:

Sampling – Basic Concepts

Population (Universe): An aggregate of objects, animate or inanimate under study in any statistical investigation.SampleA part or portion or segment or subset or subgroup of population (larger group)Random SampleA sample in which each and every unit of the population has the same probability or chance of being included in the sample.Sampling The process of learning about the population on the basis of a sample drawn from it.

ParametersPopulation constants such as µ,σ,ρ etc.StatisticsMeasures such as x, s, r etc. based on sample observations. Sampling DistributionThe distribution of a statistic such as x or σ for different samples.Standard ErrorThe standard deviation of a sampling distribution.

Sampling – Basic Concepts (contd.)

Theoretical Basis of Sampling(i) Law of Statistical Regularity

A moderately large number of items chosen at random

from a population are almost sure on the average to

possess the characteristics of the larger group.

(ii) Law of Inertia of Large Numbers

Larger the size of the sample, more accurate the

results are likely to be.

Essentials of Sampling

Representativeness - Random Method of Selection

Adequacy - Size of the Sample should

be adequate

Independence - Independent selection of

units

Homogeneity - No basic difference in the nature of the units of the universe and the sample

Types of Sampling

Subjective or Non-probability Sampling

Probability or Random Sampling

Mixed Sampling

Subjective or Non-Probability Sampling

If the sample is selected with definite

purpose in view and the choice of the sampling

units depends entirely on the discretion and choice

of the investigator, then the sampling is called a

subjective or non-probability sampling. For

example, Purposive sampling or quota

sampling, Judgment sampling and

Convenience sampling, and snowball

sampling.

Probability or Random SamplingProbability sampling is the scientific method of selecting samples according to some laws of chance in which each and every unit of the population has an equal chance of being selected. This kind methods may also be called as Random sampling methods. For example, Simple Random Sampling, Stratified Sampling, Systematic Sampling, Multi-Stage Sampling and Cluster Sampling. Mixed Sampling

If the samples are selected partly according to some laws of chance and partly according to a fixed sampling rule then it is termed as Mixed sampling.

Simple Random Sampling

• This method is purely based on probability and also known as Probability Sampling.

• The Simple Random Sampling (SRS) is the process of selection of a sample in such a manner that each and every unit of the population has an equal and independent chance of being included in the sample.

Methods1. Lottery Method2. Table of Random Numbers

i. Tippett’s (1927) random number tables (41600 digits grouped into 10400 sets of 4 digited numbers).

ii. Fisher and Yates (1938) table of random numbers (15000 digits arranged into 1500 sets of 10 digited numbers).

iii. Kendall and B.B.Smith (1939) table of random numbers (10,00,000 digits grouped into 2,00,000 sets of 5 digited numbers).

iv. C.R.Rao, Mitra and Matthai (1966) table of random numbers (20,000 digits grouped into 5000 sets of 4 digited random numbers).

3. Use of Computer

Simple Random SamplingAdvantages

It is quite simple in its sample selection

It is said to be more representative because each unit has an equal chance of being selected.

It is free from bias and prejudices.

Simple Random SamplingDisadvantages

The investigator has no control over the selection of the units for investigation.

Selection according to strictly random basis is difficult.

It is unsuitable for heterogeneous groups.

Stratified Random Sampling

In Stratified Random Sampling Method, the universe or the entire population is divided into a number of groups or strata.

Stratification variables include age, income group, residential area etc.

Selection of units are done from each stratum, proportionately or disproportionately.

Stratified Random Sampling Method Importance of Strata

In Stratified Random Sampling, the selection of the sample items depends upon the process of stratification. The following precautions are required.

Each stratum in the universe should be much enough in size.

A perfect homogeneity in different units of stratum is required.

Different variables involved in the study problem should not be considered.

There should be well defined and clear-cut stratification.

Stratified Random Sampling MethodAdvantages

It is easy to achieve representative character.

The Investigator has greater control over the selection of the samples.

Replacement of unit is possible when a particular unit is inaccessible for the study.

Stratified Random SamplingDisadvantages If stratification is not done properly then bias may

creep in.

It is very difficult to attain the proportion through deliberated means. It is because of the unequal size of the strata.

If the stratums are not very clear-cut, it may be difficult in placing cases under stratum.

The sample becomes under-representative if disproportionate weighing is done from the stratums.

Systematic Sampling

In some instances, the most practical way of sampling is to select every ‘i’th unit on a list of sampling units.

An element of randomness is introduced by using random numbers to pickup the unit with which to start.

The remaining units of the sample are selected at fixed intervals, which is known as the Sampling Interval in the Systematic Sampling.

Systematic SamplingAdvantages

The observations of the Systematic Sampling spread more evenly over the entire population.

This is easier and less costlier method of sampling and can be used conveniently in the case of large populations.

Disadvantages

If there is a hidden periodicity in the population, systematic sampling will prove to be an inefficient method of sampling.

Sampling may not be reliable if all the elements are not ordered in a manner representative of the total population.

Cluster Sampling

It is a type of sampling in which clusters of units are selected in the sample method of elementary units.

Cluster refers to the particular area and thus cluster sample implies the Area Sample. Cluster sample is basically particular geographical area.

The sample units are clustered using the concept of neighbourhood.

Cluster Sampling Advantages

Where the area of inquiry is wide, cluster sampling method is widely used.

The measurement of data can be accurate in cluster sampling.

It brings flexibility in sampling.

In cluster sampling the fieldwork gets localized or concentrated. As such field cost for collecting the data is cheaper by comparison and further the fieldwork period will also be lesser.

Cluster Sampling

Disadvantages

It is less accurate than other methods.

It is a very complex and complicated method.

Estimates of parameters and their standard errors are somewhat difficult when the clusters are of unequal sizes.

Multi-Stage Sampling

In this design various stages of selection is involved. It is appropriate where the population is scattered over a wider geographical area and no sampling frame is available.

It is useful when a sample is to be made within a limited time and cost budget.

Advantages

It requires less time, labour and money.

More convenient, effective and flexible.

Disadvantages

The procedure of estimating Standard Error is complicated.

It is difficult for a non-statistician to follow this method.

Selection of Appropriate Method of Sampling

Factors influencing Selection of the Method of Sampling

Nature of the Problem

Size of the Universe

Size of the Sample

Availability of Money and time

Sample Design

It is a plan for drawing a sample from a population. It involves making decision on the following questions:

What is the relevant population?

What method of sampling technique shall we use?

What sampling frame shall we use?

What should be the size of the sample?

How much will be the sample cost?

The Sample size should neither be too small nor too large. It should be optimum.

Optimum size is that which fulfils the requirements of efficiency, representativeness, reliability and flexibility.

Size of the Sample

1. The size of the Universe

2. The resources available

3. The degree of accuracy or precision desired

4. Homogeneity or Heterogeneity of the Universe

5. Nature of the Study

6. Methods of the Sampling adopted

7. Nature of the Respondents

The following factors should be considered while deciding the sample size.

Size of the Sample

Mathematical Formula for Determining the Sampling Size

Sample Size : n = (Zσ/d)

2

n = Sample Size

Z = Value at a specified level of confidence or desired degree of precision

σ = Standard Deviation of the population

d = Difference between population mean and sample mean or Standard Error of Mean

Example (Determining Sample Size)

Determine the sample Size if σ = 6, population mean = 25, sample mean = 23, and the desired degree of precision is 99%.

n = (Zσ/ d)2

σ = 6, d = 25-23=2, Z = 2.576 (at 1% level of significance)

Therefore, n = [(2.576 x 6)/2]2

= [7.728]2 = 59.72 ≈ 60 (approximately)

Sampling Error and Non-sampling Error

Sampling Error

The error arising due to drawing inferences about the population on the basis of few observations (sample) is termed as Sampling Error.

1. Biased Errors: Errors arise due to any biasedness in the selection, estimation etc.

2. Unbiased Errors: Errors arise due to chance differences between the members of the population included in the sample and those not included.

How to Reduce it?: By increasing the Sample Size?

Non-Sampling Errors

Non-sampling errors arise from one or more of the following factors:

1. Data specification being inadequate and inconsistent with respect to the objectives of the study.

2. Inappropriate statistical unit

3. Inaccurate or inappropriate method of data collection

1. Lack of trained and experienced investigators

2. Lack of inspection and supervision

3. Due to non-response.

4. Data processing operations such as coding, verification etc.

5. During presentation and printing of tabulated results

Non-Sampling Errors (contd.)

How to Judge the Reliability of Samples

More samples of the same size should be taken from the same universe and their results be compared. If the results are similar, the sample will be reliable.

If the measurements of the universe are known, then they should be compared with the measurements of the samples. In case of similarity, the sample will be reliable.

Sub-sample should be taken from the sample and studied. If the results of the sample and sub-sample study show similarity, the sample will be reliable.

sampling of data

Documents