Download - Sampling MSc Class

SAMPLING: Refers to the process of selecting a portion of the

population to represent the entire population.

SAMPLE: Consists of a subset of the units that comprise the

population. It is the group from which measurements are sought, or

the data is obtained.

SAMPLING UNITS/ELEMENTS: The units that make up the

sample. The element is the most basic unit about which information

is collected. In nursing research, the elements are usually humans.

POPULATION: The entire aggregate of cases that meet a

designated set of criteria.

Target population: The aggregate of cases to whom the findings of

the study would be generalized.

Accessible population: The aggregate of cases that meet a set of

criteria and are accessible to the researcher as a pool of subjects for

the study.

The ultimate purpose of sampling is to make inferences about the

characteristics of the population from which the sample was

drawn.

Eg: To study the factors that motivate individuals to seek treatment

for their alcohol use problems, the target population may be all

individuals seeking treatment for alcohol use disorders at addiction

treatment facilities in India. The accessible population may be

alcohol dependent individuals seeking treatment at a selected

deaddiction facility in South India.

SAMPLING CRITERIA (eligibility criteria): The criteria by

which the investigator makes decisions about whether an individual

would or would not be classified as a member of the population in

question.

Inclusion criteria: Criteria by which an individual would be included

in a study.

Exclusion criteria: Criteria by which an individual would be

excluded from participation in a study.

E.g. For a study evaluating the effectiveness of family involvement

in treatment for alcohol dependence, the sampling criteria can be:

Inclusion criteria can be:

• Patients who meet the criteria for alcohol dependence according to

ICD-10;

• Patients who have at least one family member living with them for

a minimum of 6 months and is currently staying with them.

Exclusion criteria:

• Patients who have comorbid medical and/or psychiatric conditions;

• Patients who cannot read and write in English/regional language.

STEPS IN SAMPLING:

Identify the target population

Identify the accessible population

Specify the eligibility criteria

Specify the sampling plan and sample size

Recruit the sample, according to the designated criteria – a

screening instrument may be necessary to determine whether a

prospective subject meets all the eligibility criteria for the study.

CHARACTERISTICS OF A GOOD SAMPLING DESIGN:• Reproduces the characteristics of the population with the greatest possible accuracy.• Is free from error due to bias, or due to the deliberate selection of the units that make up the sample.• Should be free from random sampling error.• Does not suffer from incomplete coverage of the units selected for the study.• Carefully estimates the sample size using appropriate procedures, in order to achieve reliable results.• Uses random sampling procedures as far as possible, to build representativeness into the sample and increase precision in the results obtained.

ADVANTAGES OF SAMPLING:

Most important aim of sampling – to obtain maximum information

about the phenomena under study, with the least sacrifice of time,

energy, and resources.

Economy in expenditure: If the data is collected for the entire

population, cost will be high. It is far more economical, when the

data are collected from a sample, which is usually only a fraction of

the population.

Economy in time: Carrying out the study on a sample is less

time-consuming. Tabulation, analysis, etc. also take much less time.

Thus, sampling helps to speed up the project.

Greater scope and flexibility: Sample simplifies things. E.g.

extensive training of the team may not be required, to collect or

handle data. Thus, there is greater scope and flexibility when a

sample of the population is used for a study. Greater accuracy: Sampling ensures completeness and a

relatively higher degree of accuracy. More convenient: There are fewer organizational problems, due

to a limited area of operation. Intensive and exhaustive data: As the numbers are limited, it is

possible to obtain intensive and exhaustive data. Also, rapport is

easier to establish with a sample, than with the entire population. Suitable with limited resources.

DISADVANTAGES OF SAMPLING:

Chance of bias: Sampling may involve biased selection of the

subjects, thereby leading to the drawing of erroneous conclusions.

Difficulty in getting a representative sample: Selection of a

truly representative sample is very difficult, particularly when the

phenomena under study are of a complex nature. Sometimes,

selected subjects may have to be replaced when they refuse to

cooperate, or are inaccessible. This introduces a change in the

subjects to be studied.

Need for specialized knowledge: Sampling and estimating

sample size requires specialized knowledge of sampling techniques,

statistical analysis, and calculation of probable error.

Impossibility of sampling: Sometimes, the population is too

small, or too heterogeneous that it is not possible to derive a

representative sample.

SAMPLING DESIGNS:

Sampling designs/methods can be grouped into two categories:

• Probability (random) sampling methods

• Nonprobability (nonrandom)sampling methods

The main difference between the two: Probability sampling

involves random selection of the subjects for a research study,

whereas, non-probability techniques involve nonrandom selection of

the subjects.

A random selection process is one in which every element in

the population has an equal, independent chance of being

selected for the study – which happens with probability samples.

In nonprobability samples, elements are selected using

nonrandom methods, so that there is no way to estimate the

probability that each element has of being included in the sample. So

it is likely that some segment of the population may be

systematically under- or over-represented. This can include bias into

the sample and the inferences drawn from the findings of the study.

A) Nonprobability sampling methods:

Convenience (accidental) sampling:

Convenience sampling involves use of the most conveniently

available individuals for recruitment into a study.

E.g. Distribution of questionnaires to the nursing students of a

particular class, to obtain information related to the variable under

investigation.

In clinical settings, this can happen when a researcher uses

volunteers for a clinical trial.

The disadvantage is that available subjects may not be representative

of the population to which the findings are being generalized.

A variant of convenience sampling is called ‘snowball sampling’ or

‘network sampling’. The snowballing process begins with a few

eligible subjects, and then continues on the basis of subject referrals,

until the desired sample size is reached.

Convenience sampling is the weakest form of sampling, however, is

commonly used, because this is the only feasible way of recruitment

of subjects for many studies.

Purposive/judgmental sampling:

In this method, sampling is done with a specific ‘purpose’ in mind.

That is, the researcher would usually have one or more predefined

groups h/she is seeking for a study. Thus, one of the very first things

the researcher would do is to verify that a prospective respondent

does in fact meet the criteria for being included in the sample.

E.g. If one is interested in studying alcohol/drug use patterns among

women aged 30-40 years in high income groups in an urban

community, the researcher might size up the women h/she might

encounter in the community, and approach those who appear to fit

into this category, and ask if she will participate. The subject would

then be screened to ascertain that she does meet the criteria, before

proceeding further.

Purposive sampling can be very useful when it is required to reach a

targeted sample quickly. However, the researcher is likely to

overrepresent subgroups in the population that are more readily

accessible. The sample may not be representative of the population

to which the findings would be generalized.

Quota sampling: In this, the sample is selected nonrandomly

according to some fixed quota to represent the major characteristics

of the population. That is, the researcher identifies various segments

in the population, and tries to build some representativeness into the

sample, by determining the proportions of subjects needed from

these segments.

E.g. To study drug use patterns in a particular population, the

researcher might ensure that characteristics such as all adult age

groups, gender, socioeconomic status, and educational levels, are

represented in the sample, by sampling a proportional number of

subjects in all these pre-identified groups.

B) Probability sampling methods:

Simple random sampling: This is the most basic of the

probability sampling designs. After the population has been

identified and defined, the researcher establishes a ‘sampling frame’,

which is a list of the population elements from which the sample will

be chosen. Once the listing of the elements has been developed or

located, they are generally numbered consecutively. A table of

random numbers is then used to draw at random, a sample of the

desired size.

E.g. If the medical professionals in all corporate hospitals in

Bangalore is the population, then a list of all these professionals

would be the sampling frame. Then a table of random numbers

would

be used to draw at random, a sample of medical professionals from

this list.

Advantage: Relatively easy way of obtaining a representative

sample.

Disadvantages:

- Requires a complete listing of the population elements, which may

not be available.

- This method can also be expensive and impractical, particularly

when the sample is large. The subjects may be too widely dispersed

geographically, which makes it difficult to recruit subjects using

simple random sampling.

Stratified random sampling:

Stratified random sampling involves two steps:

Dividing the population into mutually exclusive subgroups or ‘strata’;

Selecting a separate sample from each strata through random sampling (e.g. simple random sampling, systematic random sampling).

A common basis for stratification is population characteristics: the population may be stratified on the basis of gender, age, race, socioeconomic status, etc.

Two types:

Proportionate stratified random sampling

Disproportionate stratified random sampling

Proportionate stratified random sampling: Proportional allocation uses a sampling fraction in each of the strata that is proportional to that of the total population. That is, it uses the same sampling fraction for each stratum.

E.g. If a sample of 1000 is needed to be drawn from 10000

population in a particular community, it can be possible to divide

them by age and gender, and recruit a separate sample per each age

and gender stratum, using the same sampling fraction:

1000/10000 = 1/10

Disproportionate stratified random sampling (optimum

allocation): Disproportionate allocation uses different sampling

fractions in the strata. This may become necessary when one or more

strata are extremely small, in which case disproportionate allocation

randomly oversamples the small group. This would ensure that there

are enough subjects in each stratum, to make meaningful subgroup

inferences.

E.g. If a sample of 100 should be drawn from a population of 1000,

of which 850 are urban, 100 are semiurban, and 50 are rural. It may

be decided that at least 25 people are needed from each stratum, to

carry out analyses by strata. Assuming the researcher wants to

sample 50 from urban, and 25 each, from semiurban and rural areas,

different sampling fractions are used to draw the required number

from each stratum.

Advantages of stratified random sampling:

• Assures that the researcher will be able to represent not only the

overall population, but also key subgroups of the population,

including minority groups. Thus, this is the only way to effectively

assure subgroup findings in the population, if that is a key objective

of the study.

• Ensures more statistical precision and representativeness of the

final sample, than simple random sampling.

Disadvantages:

• Difficulty in obtaining a population list containing complete critical

variable information;

• Difficulty in establishing homogeneous strata in the population, and

in determining appropriate sample size to be drawn from each

stratum;

• Time-consuming, as the final sample must be drawn from multiple

enumerated listings;

• Large number of subjects required, to support the subdivisions in

the sample.

Cluster sampling: The most common procedure for large scale surveys is cluster sampling. In cluster sampling, there is a successive random sampling of units. The first unit to be sampled is large groupings or clusters. When successive stages or levels are involved in the selection of the clusters, this approach is referred to as ‘multistage sampling’.

For instance, the usual procedure for selecting a sample of citizens for a national survey (e.g. to assess food consumption practices), is to successively sample such administrative units as states, districts, cities, blocks, and then households. The clusters can be selected either by simple, or by stratified sampling methods.

Advantages:

• Ensure efficiency of administration: more economical, practical,

time saving, lesser cost;

• Does not require a complete frame of the whole population – it

requires a list of the members in the selected clusters only.

Disadvantages:

• A cluster may not truly be representative of the parent population.

Therefore, estimates made based on the clusters may be inaccurate;

• Often lead to an increase in the standard error of survey estimates.

Systematic random sampling: Involves the selection of every

kth element from some list or group, such as every 10th person on a

patients’ list, or every 10th household on a list of households in a

village. The sampling interval is the ratio of population to the sample

size, and sets the standard distance between the elements chosen for

the sample. The first number is randomly chosen between 1 to k.

Thereafter, every kth case, based on the sampling interval, is

recruited for the study.

The formula would be:

k = N/n (N=size of the universe; n=desired sample size).

E.g. If 100 households need to be sampled out of a total of 5000

households in a village, then applying the formula:

k = 5000/100.

Thus, every 50th household will be sampled in the village, after

picking a random integer between 1 and 50.

Advantages:

• The main advantage is the ease with which the sample can be

drawn.

• In some cases, it can even be more precise than simple/stratified

random sampling (if the population is large enough, for instance).

Disadvantages:

• The method is not truly random – all the elements selected (except

for the first one) are predetermined by the constant interval.

• The subjects can arrange themselves in such a way so as to be

selected (or not selected) for the study.

• The method can sometimes result in a badly biased sample.

FACTORS INFLUENCING CHOICE OF SAMPLING

TECHNIQUE

No one individual sampling plan can be recommended for all

situations!

Choice depends on such considerations as:

• the nature of the study;

• size of the universe;

• desired sample size;

• availability of resources, time;

• degree of precision required.

On the whole, probability sampling is more desirable, because of its

ability to build representativeness into the sample. However,

nonprobability techniques are often acceptable for pilot, exploratory,

or indepth qualitative research.

SAMPLE SIZE

Sample size is important primarily because of its effect on

statistical power. Statistical power is the probability that a statistical

test will indicate a significant difference when the difference truly

exists.

A general principle is to use the largest sample possible. The

larger the sample, the more representative of the population it is

likely to be.

Some factors that can affect sample size decisions:

Nature of the investigation: Smaller samples are usually

sufficient for indepth qualitative studies. On the other hand,

quantitative studies generally test hypotheses using formal statistical

procedures, and require larger samples to provide a meaningful

statistical test.

Homogeneity of the population: If the population is

homogeneous, then smaller samples are adequate. However, in most

studies, it is often safer to assume a fair degree of heterogeneity, in

which case a larger sample would be required.

Effect size: Effect size is concerned with the strength of the

relationship between the variables. If the independent and dependent

variables are strongly interrelated, then a relatively smaller sample

may be adequate to demonstrate this relationship statistically.

Attrition: This refers to loss of subjects during the course of the

study – which is a common problem in longitudinal studies.

Researchers should anticipate a certain amount of subject loss and

recruit the participants accordingly.

Number of variables: In general, the greater the number of

variables, the larger the sample should be.

Subgroup analyses: When a sample is divided to test for effects

in specific subgroups, the sample must be large enough to support

these divisions in the sample.

Sensitivity of the measures: In general, when the measuring

instrument is more susceptible to errors, larger samples are needed to

test hypotheses correctly. For instance, biophysiologic measures are

generally more sensitive, so smaller samples may be sufficient. On

the other hand, tools that assess psychological attributes contain a

fair amount of error, so relatively larger samples are required.

Resources: The projected cost of using a particular sampling

strategy, manpower, time available, etc. can be some practical

considerations affecting sample size.

SAMPLING ERROR

Sampling errors refer to the unavoidable errors that occur

whenever sampling done. It is a discrepancy between the true value

(i.e. the actual population value), and the estimated value. Sampling

error is thus the deviation of the selected sample from the true

characteristics, traits, behaviors, or qualities of the entire population.

Sources of sampling error:

• Sampling bias: This exists when all the members of the sampling

frame do not have an equal, independent chance of being recruited

for the study - which is what happens when nonprobability sampling

methods are employed to draw the sample.

A biased sample can also result when a sample element is

substituted by another, because it was inaccessible. This may cause

the sample to lose its representativeness.

• Sampling variance: Sampling variance arises because, given the

design of the sample, many different sets of elements could have

been drawn by chance for the study. Even when all the elements have

an equal chance of being selected, the same sample design can yield

many different samples. This produces estimates that vary, and this

variation is the basis of ‘sampling variance’ of the sample statistics.

The basis of both sampling bias and sampling variance is the same –

not all elements of the sampling frame were measured.

Ways to reduce:

• By increasing the sample size: As a general rule, sampling error

decreases as the sample size increases.

• By adopting appropriate sampling designs: In general,

probability sampling techniques are more desirable, as they increase

the chances of obtaining a representative sample by recruiting

subjects randomly. For instance, using stratified random sampling

techniques would build more representativeness into the sample, by

ensuring that different subgroups are represented in the sample.

Although results obtained from samples selected randomly

are not free from error, using random selection techniques does

guarantee that any differences that exist between the actual and

estimated values are purely a function of chance. It is usually

possible to estimate the magnitude of the sampling error that has

resulted, when random sampling techniques are used.

NON-SAMPLING ERRORS

Non-sampling errors can be defined as errors arising during

the course of all activities during the study, other than sampling.

These are not chance errors, and can be present in sample surveys as

well as censuses.

Sources:

Random errors: Random errors can be described as the

unexplained differences that exists between a true score and an

obtained score. Random error does not have any consistent effects

across the entire sample, and are generally cancelled out if a large

sample is used. The important property of random error is that it adds

variability to the data but does not affect average performance of the

group.

Systematic errors: Systematic error is caused by any factors that

systematically affect measurement of the variable across the sample.

Unlike random error, systematic errors tend to be consistently either

positive or negative. Systematic errors tend to accumulate over the

entire sample, and often lead to bias in the final results of the study.

Bias caused by systematic errors cannot be reduced by increasing the

sample size. Systematic errors are a principal cause for concern, and

need to identified and corrected to obtain accurate results.

Errors related to coverage of the population elements: This

occurs when units are omitted, duplicated or wrongly included.

Omissions are referred to as "undercoverage”, while duplication and

wrongful inclusions are called "overcoverage". Coverage errors may

also occur in field procedures (e.g., while a survey is conducted, the

interviewer misses several households or persons).

Response errors: Response errors result when data is incorrectly

requested (e.g. poor interviewing skills), provided (e.g. due to faulty

recollections), received, or recorded.

Non-response errors: Occur when participants fail to cooperate

or respond, or provide incomplete information.

Processing errors: Processing errors sometimes emerge during

the preparation of the final data files e.g. when data are being coded

or edited.

Analysis errors: These may occur if the wrong analytical tools

are used. Errors that occur during the publication of data results may

also be considered analysis errors.

Ways to reduce:

• By carefully pretesting the data collection instruments, and getting

adequate feedback from the respondents regarding how easy or hard

the measure was, and information about how the testing environment

affected their performance.

• If information is being gathered using people to collect the data

(such as interviewers or observers), they should be thoroughly

trained, so that they do not introduce error.

• When collecting the data, it is important to double-check the data

thoroughly. All data entry for computer analysis should be

thoroughly verified.

• Statistical procedures can be used to adjust for measurement error,

ranging from simple formulae that can be applied directly to the data,

to complex modeling procedures for modeling the error and its

effects.

Download - Sampling MSc Class

Top Related