Download - Sampling MSc Class
SAMPLING: Refers to the process of selecting a portion of the
population to represent the entire population.
SAMPLE: Consists of a subset of the units that comprise the
population. It is the group from which measurements are sought, or
the data is obtained.
SAMPLING UNITS/ELEMENTS: The units that make up the
sample. The element is the most basic unit about which information
is collected. In nursing research, the elements are usually humans.
POPULATION: The entire aggregate of cases that meet a
designated set of criteria.
Target population: The aggregate of cases to whom the findings of
the study would be generalized.
Accessible population: The aggregate of cases that meet a set of
criteria and are accessible to the researcher as a pool of subjects for
the study.
The ultimate purpose of sampling is to make inferences about the
characteristics of the population from which the sample was
drawn.
Eg: To study the factors that motivate individuals to seek treatment
for their alcohol use problems, the target population may be all
individuals seeking treatment for alcohol use disorders at addiction
treatment facilities in India. The accessible population may be
alcohol dependent individuals seeking treatment at a selected
deaddiction facility in South India.
SAMPLING CRITERIA (eligibility criteria): The criteria by
which the investigator makes decisions about whether an individual
would or would not be classified as a member of the population in
question.
Inclusion criteria: Criteria by which an individual would be included
in a study.
Exclusion criteria: Criteria by which an individual would be
excluded from participation in a study.
E.g. For a study evaluating the effectiveness of family involvement
in treatment for alcohol dependence, the sampling criteria can be:
Inclusion criteria can be:
• Patients who meet the criteria for alcohol dependence according to
ICD-10;
• Patients who have at least one family member living with them for
a minimum of 6 months and is currently staying with them.
Exclusion criteria:
• Patients who have comorbid medical and/or psychiatric conditions;
• Patients who cannot read and write in English/regional language.
STEPS IN SAMPLING:
Identify the target population
Identify the accessible population
Specify the eligibility criteria
Specify the sampling plan and sample size
Recruit the sample, according to the designated criteria – a
screening instrument may be necessary to determine whether a
prospective subject meets all the eligibility criteria for the study.
CHARACTERISTICS OF A GOOD SAMPLING DESIGN:• Reproduces the characteristics of the population with the greatest possible accuracy.• Is free from error due to bias, or due to the deliberate selection of the units that make up the sample.• Should be free from random sampling error.• Does not suffer from incomplete coverage of the units selected for the study.• Carefully estimates the sample size using appropriate procedures, in order to achieve reliable results.• Uses random sampling procedures as far as possible, to build representativeness into the sample and increase precision in the results obtained.
ADVANTAGES OF SAMPLING:
Most important aim of sampling – to obtain maximum information
about the phenomena under study, with the least sacrifice of time,
energy, and resources.
Economy in expenditure: If the data is collected for the entire
population, cost will be high. It is far more economical, when the
data are collected from a sample, which is usually only a fraction of
the population.
Economy in time: Carrying out the study on a sample is less
time-consuming. Tabulation, analysis, etc. also take much less time.
Thus, sampling helps to speed up the project.
Greater scope and flexibility: Sample simplifies things. E.g.
extensive training of the team may not be required, to collect or
handle data. Thus, there is greater scope and flexibility when a
sample of the population is used for a study. Greater accuracy: Sampling ensures completeness and a
relatively higher degree of accuracy. More convenient: There are fewer organizational problems, due
to a limited area of operation. Intensive and exhaustive data: As the numbers are limited, it is
possible to obtain intensive and exhaustive data. Also, rapport is
easier to establish with a sample, than with the entire population. Suitable with limited resources.
DISADVANTAGES OF SAMPLING:
Chance of bias: Sampling may involve biased selection of the
subjects, thereby leading to the drawing of erroneous conclusions.
Difficulty in getting a representative sample: Selection of a
truly representative sample is very difficult, particularly when the
phenomena under study are of a complex nature. Sometimes,
selected subjects may have to be replaced when they refuse to
cooperate, or are inaccessible. This introduces a change in the
subjects to be studied.
Need for specialized knowledge: Sampling and estimating
sample size requires specialized knowledge of sampling techniques,
statistical analysis, and calculation of probable error.
Impossibility of sampling: Sometimes, the population is too
small, or too heterogeneous that it is not possible to derive a
representative sample.
SAMPLING DESIGNS:
Sampling designs/methods can be grouped into two categories:
• Probability (random) sampling methods
• Nonprobability (nonrandom)sampling methods
The main difference between the two: Probability sampling
involves random selection of the subjects for a research study,
whereas, non-probability techniques involve nonrandom selection of
the subjects.
A random selection process is one in which every element in
the population has an equal, independent chance of being
selected for the study – which happens with probability samples.
In nonprobability samples, elements are selected using
nonrandom methods, so that there is no way to estimate the
probability that each element has of being included in the sample. So
it is likely that some segment of the population may be
systematically under- or over-represented. This can include bias into
the sample and the inferences drawn from the findings of the study.
A) Nonprobability sampling methods:
Convenience (accidental) sampling:
Convenience sampling involves use of the most conveniently
available individuals for recruitment into a study.
E.g. Distribution of questionnaires to the nursing students of a
particular class, to obtain information related to the variable under
investigation.
In clinical settings, this can happen when a researcher uses
volunteers for a clinical trial.
The disadvantage is that available subjects may not be representative
of the population to which the findings are being generalized.
A variant of convenience sampling is called ‘snowball sampling’ or
‘network sampling’. The snowballing process begins with a few
eligible subjects, and then continues on the basis of subject referrals,
until the desired sample size is reached.
Convenience sampling is the weakest form of sampling, however, is
commonly used, because this is the only feasible way of recruitment
of subjects for many studies.
Purposive/judgmental sampling:
In this method, sampling is done with a specific ‘purpose’ in mind.
That is, the researcher would usually have one or more predefined
groups h/she is seeking for a study. Thus, one of the very first things
the researcher would do is to verify that a prospective respondent
does in fact meet the criteria for being included in the sample.
E.g. If one is interested in studying alcohol/drug use patterns among
women aged 30-40 years in high income groups in an urban
community, the researcher might size up the women h/she might
encounter in the community, and approach those who appear to fit
into this category, and ask if she will participate. The subject would
then be screened to ascertain that she does meet the criteria, before
proceeding further.
Purposive sampling can be very useful when it is required to reach a
targeted sample quickly. However, the researcher is likely to
overrepresent subgroups in the population that are more readily
accessible. The sample may not be representative of the population
to which the findings would be generalized.
Quota sampling: In this, the sample is selected nonrandomly
according to some fixed quota to represent the major characteristics
of the population. That is, the researcher identifies various segments
in the population, and tries to build some representativeness into the
sample, by determining the proportions of subjects needed from
these segments.
E.g. To study drug use patterns in a particular population, the
researcher might ensure that characteristics such as all adult age
groups, gender, socioeconomic status, and educational levels, are
represented in the sample, by sampling a proportional number of
subjects in all these pre-identified groups.
B) Probability sampling methods:
Simple random sampling: This is the most basic of the
probability sampling designs. After the population has been
identified and defined, the researcher establishes a ‘sampling frame’,
which is a list of the population elements from which the sample will
be chosen. Once the listing of the elements has been developed or
located, they are generally numbered consecutively. A table of
random numbers is then used to draw at random, a sample of the
desired size.
E.g. If the medical professionals in all corporate hospitals in
Bangalore is the population, then a list of all these professionals
would be the sampling frame. Then a table of random numbers
would
be used to draw at random, a sample of medical professionals from
this list.
Advantage: Relatively easy way of obtaining a representative
sample.
Disadvantages:
- Requires a complete listing of the population elements, which may
not be available.
- This method can also be expensive and impractical, particularly
when the sample is large. The subjects may be too widely dispersed
geographically, which makes it difficult to recruit subjects using
simple random sampling.
Stratified random sampling:
Stratified random sampling involves two steps:
Dividing the population into mutually exclusive subgroups or ‘strata’;
Selecting a separate sample from each strata through random sampling (e.g. simple random sampling, systematic random sampling).
A common basis for stratification is population characteristics: the population may be stratified on the basis of gender, age, race, socioeconomic status, etc.
Two types:
Proportionate stratified random sampling
Disproportionate stratified random sampling
Proportionate stratified random sampling: Proportional allocation uses a sampling fraction in each of the strata that is proportional to that of the total population. That is, it uses the same sampling fraction for each stratum.
E.g. If a sample of 1000 is needed to be drawn from 10000
population in a particular community, it can be possible to divide
them by age and gender, and recruit a separate sample per each age
and gender stratum, using the same sampling fraction:
1000/10000 = 1/10
Disproportionate stratified random sampling (optimum
allocation): Disproportionate allocation uses different sampling
fractions in the strata. This may become necessary when one or more
strata are extremely small, in which case disproportionate allocation
randomly oversamples the small group. This would ensure that there
are enough subjects in each stratum, to make meaningful subgroup
inferences.
E.g. If a sample of 100 should be drawn from a population of 1000,
of which 850 are urban, 100 are semiurban, and 50 are rural. It may
be decided that at least 25 people are needed from each stratum, to
carry out analyses by strata. Assuming the researcher wants to
sample 50 from urban, and 25 each, from semiurban and rural areas,
different sampling fractions are used to draw the required number
from each stratum.
Advantages of stratified random sampling:
• Assures that the researcher will be able to represent not only the
overall population, but also key subgroups of the population,
including minority groups. Thus, this is the only way to effectively
assure subgroup findings in the population, if that is a key objective
of the study.
• Ensures more statistical precision and representativeness of the
final sample, than simple random sampling.
Disadvantages:
• Difficulty in obtaining a population list containing complete critical
variable information;
• Difficulty in establishing homogeneous strata in the population, and
in determining appropriate sample size to be drawn from each
stratum;
• Time-consuming, as the final sample must be drawn from multiple
enumerated listings;
• Large number of subjects required, to support the subdivisions in
the sample.
Cluster sampling: The most common procedure for large scale surveys is cluster sampling. In cluster sampling, there is a successive random sampling of units. The first unit to be sampled is large groupings or clusters. When successive stages or levels are involved in the selection of the clusters, this approach is referred to as ‘multistage sampling’.
For instance, the usual procedure for selecting a sample of citizens for a national survey (e.g. to assess food consumption practices), is to successively sample such administrative units as states, districts, cities, blocks, and then households. The clusters can be selected either by simple, or by stratified sampling methods.
Advantages:
• Ensure efficiency of administration: more economical, practical,
time saving, lesser cost;
• Does not require a complete frame of the whole population – it
requires a list of the members in the selected clusters only.
Disadvantages:
• A cluster may not truly be representative of the parent population.
Therefore, estimates made based on the clusters may be inaccurate;
• Often lead to an increase in the standard error of survey estimates.
Systematic random sampling: Involves the selection of every
kth element from some list or group, such as every 10th person on a
patients’ list, or every 10th household on a list of households in a
village. The sampling interval is the ratio of population to the sample
size, and sets the standard distance between the elements chosen for
the sample. The first number is randomly chosen between 1 to k.
Thereafter, every kth case, based on the sampling interval, is
recruited for the study.
The formula would be:
k = N/n (N=size of the universe; n=desired sample size).
E.g. If 100 households need to be sampled out of a total of 5000
households in a village, then applying the formula:
k = 5000/100.
Thus, every 50th household will be sampled in the village, after
picking a random integer between 1 and 50.
Advantages:
• The main advantage is the ease with which the sample can be
drawn.
• In some cases, it can even be more precise than simple/stratified
random sampling (if the population is large enough, for instance).
Disadvantages:
• The method is not truly random – all the elements selected (except
for the first one) are predetermined by the constant interval.
• The subjects can arrange themselves in such a way so as to be
selected (or not selected) for the study.
• The method can sometimes result in a badly biased sample.
FACTORS INFLUENCING CHOICE OF SAMPLING
TECHNIQUE
No one individual sampling plan can be recommended for all
situations!
Choice depends on such considerations as:
• the nature of the study;
• size of the universe;
• desired sample size;
• availability of resources, time;
• degree of precision required.
On the whole, probability sampling is more desirable, because of its
ability to build representativeness into the sample. However,
nonprobability techniques are often acceptable for pilot, exploratory,
or indepth qualitative research.
SAMPLE SIZE
Sample size is important primarily because of its effect on
statistical power. Statistical power is the probability that a statistical
test will indicate a significant difference when the difference truly
exists.
A general principle is to use the largest sample possible. The
larger the sample, the more representative of the population it is
likely to be.
Some factors that can affect sample size decisions:
Nature of the investigation: Smaller samples are usually
sufficient for indepth qualitative studies. On the other hand,
quantitative studies generally test hypotheses using formal statistical
procedures, and require larger samples to provide a meaningful
statistical test.
Homogeneity of the population: If the population is
homogeneous, then smaller samples are adequate. However, in most
studies, it is often safer to assume a fair degree of heterogeneity, in
which case a larger sample would be required.
Effect size: Effect size is concerned with the strength of the
relationship between the variables. If the independent and dependent
variables are strongly interrelated, then a relatively smaller sample
may be adequate to demonstrate this relationship statistically.
Attrition: This refers to loss of subjects during the course of the
study – which is a common problem in longitudinal studies.
Researchers should anticipate a certain amount of subject loss and
recruit the participants accordingly.
Number of variables: In general, the greater the number of
variables, the larger the sample should be.
Subgroup analyses: When a sample is divided to test for effects
in specific subgroups, the sample must be large enough to support
these divisions in the sample.
Sensitivity of the measures: In general, when the measuring
instrument is more susceptible to errors, larger samples are needed to
test hypotheses correctly. For instance, biophysiologic measures are
generally more sensitive, so smaller samples may be sufficient. On
the other hand, tools that assess psychological attributes contain a
fair amount of error, so relatively larger samples are required.
Resources: The projected cost of using a particular sampling
strategy, manpower, time available, etc. can be some practical
considerations affecting sample size.
SAMPLING ERROR
Sampling errors refer to the unavoidable errors that occur
whenever sampling done. It is a discrepancy between the true value
(i.e. the actual population value), and the estimated value. Sampling
error is thus the deviation of the selected sample from the true
characteristics, traits, behaviors, or qualities of the entire population.
Sources of sampling error:
• Sampling bias: This exists when all the members of the sampling
frame do not have an equal, independent chance of being recruited
for the study - which is what happens when nonprobability sampling
methods are employed to draw the sample.
A biased sample can also result when a sample element is
substituted by another, because it was inaccessible. This may cause
the sample to lose its representativeness.
• Sampling variance: Sampling variance arises because, given the
design of the sample, many different sets of elements could have
been drawn by chance for the study. Even when all the elements have
an equal chance of being selected, the same sample design can yield
many different samples. This produces estimates that vary, and this
variation is the basis of ‘sampling variance’ of the sample statistics.
The basis of both sampling bias and sampling variance is the same –
not all elements of the sampling frame were measured.
Ways to reduce:
• By increasing the sample size: As a general rule, sampling error
decreases as the sample size increases.
• By adopting appropriate sampling designs: In general,
probability sampling techniques are more desirable, as they increase
the chances of obtaining a representative sample by recruiting
subjects randomly. For instance, using stratified random sampling
techniques would build more representativeness into the sample, by
ensuring that different subgroups are represented in the sample.
Although results obtained from samples selected randomly
are not free from error, using random selection techniques does
guarantee that any differences that exist between the actual and
estimated values are purely a function of chance. It is usually
possible to estimate the magnitude of the sampling error that has
resulted, when random sampling techniques are used.
NON-SAMPLING ERRORS
Non-sampling errors can be defined as errors arising during
the course of all activities during the study, other than sampling.
These are not chance errors, and can be present in sample surveys as
well as censuses.
Sources:
Random errors: Random errors can be described as the
unexplained differences that exists between a true score and an
obtained score. Random error does not have any consistent effects
across the entire sample, and are generally cancelled out if a large
sample is used. The important property of random error is that it adds
variability to the data but does not affect average performance of the
group.
Systematic errors: Systematic error is caused by any factors that
systematically affect measurement of the variable across the sample.
Unlike random error, systematic errors tend to be consistently either
positive or negative. Systematic errors tend to accumulate over the
entire sample, and often lead to bias in the final results of the study.
Bias caused by systematic errors cannot be reduced by increasing the
sample size. Systematic errors are a principal cause for concern, and
need to identified and corrected to obtain accurate results.
Errors related to coverage of the population elements: This
occurs when units are omitted, duplicated or wrongly included.
Omissions are referred to as "undercoverage”, while duplication and
wrongful inclusions are called "overcoverage". Coverage errors may
also occur in field procedures (e.g., while a survey is conducted, the
interviewer misses several households or persons).
Response errors: Response errors result when data is incorrectly
requested (e.g. poor interviewing skills), provided (e.g. due to faulty
recollections), received, or recorded.
Non-response errors: Occur when participants fail to cooperate
or respond, or provide incomplete information.
Processing errors: Processing errors sometimes emerge during
the preparation of the final data files e.g. when data are being coded
or edited.
Analysis errors: These may occur if the wrong analytical tools
are used. Errors that occur during the publication of data results may
also be considered analysis errors.
Ways to reduce:
• By carefully pretesting the data collection instruments, and getting
adequate feedback from the respondents regarding how easy or hard
the measure was, and information about how the testing environment
affected their performance.
• If information is being gathered using people to collect the data
(such as interviewers or observers), they should be thoroughly
trained, so that they do not introduce error.
• When collecting the data, it is important to double-check the data
thoroughly. All data entry for computer analysis should be
thoroughly verified.
• Statistical procedures can be used to adjust for measurement error,
ranging from simple formulae that can be applied directly to the data,
to complex modeling procedures for modeling the error and its
effects.