Download - On Samples And Sampling
“On Samples And Sampling”
Title drawn from Elisabeth Kubler-Ross’ anagramatic phraseologies: “On death and dying”; “On grief and grieving”; “Real taste of life (On life and living)”
Ehi IgumborSchool of Public Health
University of the Western Cape
“On Samples”What is Research all about? Gathering data, information or
evidence about a subject or topic What is the outcome of HIV-associated
tuberculosis in the era of HAART?
But how much data, information or evidence? Should every HIV-associated TB case on
HAART be used? Should only 2 be left out? Should 50% of them be used?
Populations and Samples
Population Larger group to which research results are
generalized Defined aggregate of persons, objects, or
events that meet a specified set of criteria
Sample Sub-group of the population Serves as reference group for estimating
characteristics of or drawing conclusions about the population
Populations and Samples
Population (Group about whom you wish to gather data defined by person, place and time)
Sample(Sub-group of total study population)
Why use a sample? Save time Save money Save energy Not practical to get everyone Less data so limit error (fewer
opportunities to make mistake)– improved quality
Why Not? - Just as good!
But… Sampling Bias!
Are responses of sample members representative of the population?
No way to guarantee, but good sampling procedures help
Not so much size as representativity: Gallup and Harris polls predicted Nixon win using
2000 voters (43% predicted, 42.9% result) 1936 Literary Digest poll predicted Alf Landon win by
57% based on 2million voters from list of automobile owners and telephone directories
Sampling Bias
Occurs when individuals selected over- or under- represent certain population attributes that are related to the phenomenon under study
May be Conscious or Unconscious
Learning Objectives
Understand strategies for selecting a sample
Understand how to determine the required size of a sample
“On Sampling” –Determining Sampling Procedure What do I want to know?
Does self-reported quality of life of patients with HIV-associated tuberculosis improve after HAART compared to before HAART?
Is the CD4 count in patients on HAART different from those not on HAART?
May involve simply comparing 2 indicators or more rigorous analysis of changes in HAART and not in HAART to estimate the strength of the impact of HAART
Determining Sampling Procedure
What is my Population? Need a good problem statement Everyone affected (may be geographical,
demographical, economic, social, or other specific content of study)
Should not be too narrow Sometimes source of data is different from
sampling unit e.g household surveys
Determining Sampling Procedure
Remember Populations are not necessarily restricted human subjects: May include people, places, organizations,
objects, animals, days or any unit of interest. E.g Blood samples in an epidemiology study Housing units in a household survey Series of measurements in a test-retest reliability
study Inventory of manufactured products in industrial
quality control studies
Target Population and Accessible Population
Study of motor skills Target or reference population:
“ALL children with learning disabilities in South Africa today”
Accessible or experimental population “ALL children identified as having a learning
disability in Cape Town’s school system”
Inclusion and Exclusion Criteria
Inclusion Criteria: primary traits of the target and accessible populations that will qualify someone as a subject
Exclusion Criteria: factors that would preclude someone from being studied. (Are potentially confounding to the results)
Determining Sampling Procedure
To sample or Not to sample? Is it feasible to use population? ?Cost ?Time Sometimes “census” of all needed
Small population size Useful to know information on every individual Scope of study: rapid assessment or in-depth
investigation
Types of Samples
Sampling Procedure Non-probability
Selection of samples is made by nonrandom methods i.e not based on chance
No way to accurately estimate chance of inclusion/degree of sampling error
Is convenient and economical Quality depends on knowledge, judgment
and expertise of researcher
Non-Probability Samples
Haphazard Sampling
No conscious planning or consistent procedures are employed to select the sample units
Non-Probability Samples
Convenience or “accidental” Sampling
A unit is self-selected (e.g volunteers) or easily accessible/available
E.g consecutive sampling of patients Although may yield useful information, caution with
making inferences!
Non-Probability Samples
Quota Sampling
A pre-determined number of units which have certain characteristics are selected
Controls for confounding effect of known characteristics of a population by selecting adequate numbers from each stratum
E.g “50 men and 50 women to be interviewed on a busy street”
Non-Probability SamplesSnowball Samples
Useful if hard to locate subjects with specific characteristics
Carried in stages: Select a few subjects who meet selection criteria Ask selected subjects to identify others who have requisite
characteristics Repeat process of “chain referral” or “snowballing” till
adequate sample size obtained
Non-Probability SamplesPurposive or judgment Sampling
Researcher handpicks subjects on basis of specific characteristics or attributes that are important to the research study
Units used sometimes EXTREME or CRITICAL units May be most useful to pre-test an instrument for a larger
study or in qualitative studies to ensure subjects have appropriate knowledge and will be good informants for the study
Probability Samples Every element in the population has a known, nonzero
probability of selection
Because probability is known, can be generalized (at least within a given level of precision) to the larger population
Risk of incorrectly generalizing to larger population less, thus better than non-probability samples
Sampling Frame A list of units or elements from which the sample is
to be selected Should list every element separately, once and
only once, and nothing else appears on the list Common Problems:
Missing elements, non-coverage or incomplete frame Blanks or foreign elements Duplicate listings Clusters of elements combined into one listing
Sampling Frame
What do you do if a “poor” Sampling Frame?
BEFORE SELECTING SAMPLE:
Ignore or disregard the problem
Redefine population to fit sampling frame
Spend time and effort to fix the frame
What do you do if a “poor” Sampling Frame?
Missing elements: Use supplementary methods. Eg active fieldwork to get homeless individuals in a household based survey
Foreign elements: Omit if identified
Duplicate elements: Select first, last, current listing Any unique feature?
Clusters: Use all. Or randomly select one
Probability Samples- Simple Random
Easiest and least complex Equal chance for each element Using table of random numbers:
Assign a number to each element in list Select a starting point Determine number of columns to use Select numbers from table Discard any duplicate you select Select numbers until obtain desired sample size
Probability Samples- Simple Random
Probability Samples- Stratified Random
Improves on estimates of simple random by random sampling population in strata
3 types: Proportionate Disproportionate or Optimal Equal size
Probability Samples- Stratified Random
Probability Samples- Systematic Samples
Select first element randomly and then every nth element on the list afterwards
Starting point will be a number between 1 and 10 randomly drawn from a table of random numbers
Gives each element equal (but not independent) chance
Useful if you do not have a list when elements are arranged in space e.g house selection
Probability Samples- Systematic Samples
Probability Samples- Cluster or Area Sample
A method of selecting sample units in which the unit contains a cluster of elements
The probability of selecting an element is a product of the probabilities of selecting its cluster
Different from stratified in that ideally, elements are heterogenous. (In stratified they are homogenous)
NB: In practice though, clusters tend to be homogenous
Probability Samples- Cluster or Area Sample
PUTTING IT TOGETHER- SELECTING A SAMPLING DESIGN
Multi-faceted process Depends on
Amount of information available about population If characteristics known – stratified random If little known – less complex simple or systematic When list unavailable – cluster ALSO combined: Stratified multi-staged
cluster sampling
Determine the type of sampling used
A soccer coach selects 6 players from a group of boys aged 8 to 10, 7 players from a group of boys aged 11 to 12, and 3 players from a group of boys aged 13 to 14 to form a recreational soccer team.
Determine the type of sampling used
A soccer coach selects 6 players from a group of boys aged 8 to 10, 7 players from a group of boys aged 11 to 12, and 3 players from a group of boys aged 13 to 14 to form a recreational soccer team.
Stratified
Determine the type of sampling used
A pollster interviews all human resource personnel in five different high tech companies.
Determine the type of sampling used
A pollster interviews all human resource personnel in five different high tech companies.
Cluster
Determine the type of sampling used
An engineering researcher interviews 50 women engineers and 50 men engineers.
Determine the type of sampling used
An engineering researcher interviews 50 women engineers and 50 men engineers.
Stratified
Determine the type of sampling used
A medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital.
Determine the type of sampling used
A medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital.
Systematic
Determine the type of sampling used
A high school counselor uses a computer to generate 50 random numbers and then picks students whose names correspond to the numbers.
Determine the type of sampling used
A high school counselor uses a computer to generate 50 random numbers and then picks students whose names correspond to the numbers.
Simple random
Determine the type of sampling used
A student interviews classmates in his algebra class to determine how many pairs of jeans a student owns, on the average.
Determine the type of sampling used
A student interviews classmates in his algebra class to determine how many pairs of jeans a student owns, on the average.
Convenience
Suppose UWC has 10,000 part-time students (the population). We are interested in the average amount of money a part-time student spends on books in an academic year. Asking all 10,000 students is an almost impossible task. Suppose we take two different samples.
First, we use convenience sampling and survey 10 students from a first semester Masters in Public Health class. Many of these students have been attending the 2009 Summer School and taking elective course on Epidemiology and biostatistics in addition to their MPH core courses . The amount of money they spend is as follows:R128; R87; R173; R116; R130; R204; R147; R189; R93; R153
The second sample is taken by using a list from the Division of Life Long Learning unit of adult learners who take part-time classes and taking every 5th student on the list, for a total of 10 students. They spend:
R50; R40; R36; R15; R50; R100; R40; R53;
R22; R22
Problem 1
Do you think that either of these samples is representative of (or is characteristic of) the entire10,000 part-time student population?
Problem 2
Since these samples are not representative of the entire population, is it wise to use the results to describe the entire population?
Now, suppose we take a third sample. We choose ten different part-time students from all disciplines which offer part-time studies (Public Health, Physio, EMS, etc). Each student is chosen using simple random sampling. Using a calculator, random numbers are generated and a student from a particular discipline is selected if he/she has a corresponding number. The students spend:
R180; R50; R150; R85; R260; R75; R180; R200; R200; R150
Do you think this sample is representative of the population?
Problem 3
Learning Objectives
Understand strategies for selecting a sample
Understand how to determine the required size of a sample
Sample Size Determination
Determined by: Purpose of study Population size Risk of selecting a “bad” sample Allowable sampling error
Sample Size Criteria
Level of precision
Level of confidence or risk
Degree of variability
Level of Precision
Also called “Sampling error”
Range in which the true value of the population is estimated to be
So, 42% (+/- 2%): 40% - 44%
Confidence Level
Also called “Risk level”
Based on principle of Central Limit Theorem
95% CI – 95 out of 100 samples will have the true population value within the range of precision specified
Confidence Level
Chance that sample you obtain does not represent the true population value is shown in shaded area
Risk reduces for 99% CI and increases for 90% CI
Degree of Variability Distribution of attributes
Heterogenous – bigger sample Homogenous – smaller sample
Note that 50% indicates a greater level of variability than 20% and 80%
0.5 is mostly used in conservative samples because it indicates maximum variability
Strategies for determining Sample Size
Using a Census for small populations
Using a Sample Size of a Similar Study
Using Published Tables
Using Formula to Calculate a Sample Size
Using a Census for small populations
Use entire population as sample May be useful in Small population cost
permitting (<200) Why use this?
Eliminates sampling error Provides individual level data “Fixed costs” eg of questionnaire design etc Virtually entire population would have to be in
sample in small populations anyway
Using a Sample Size of a Similar Study
Could be a valuable approach
But without reviewing the procedures employed, may run risk of repeating errors made previously
Review literature to get guidance on “typical” sample size
Using Published Tables Use published tables which provide sample
size for a given set of criteria
Sample sizes in tables reflect the number of OBTAINED responses (not necessarily the number of surveys mailed)
Assumptions of normality in distribution
Using Formulas to Calculate A Sample Size
Equation 2: (Snedecor & Cochran 1989)
22
2
dd
qpqpCn eecc
Equation 1: (Fleiss 1981)
2
21
d
sCn
2)(1*
eN
Nn
Equation 3: (Yamane’s 1967)
Other Considerations Assumes simple random sampling
Number needed for data analysis (eg multiple regression analysis, log linear analysis require a bigger sample than if simple descriptive analysis)
Sample size increased by 30% to compensate for non-response; 10% to compensate for persons unable to reach
Calculation Using Computer Programmes
Epi Info
Online Softwares: eg Rao Soft
EXAMPLE: Sample Size Calculation
2)(1*
eN
Nn
Where n = Sample size N = Population size e = Level of precision or Sampling of Error which is ±5%
Yamane’s formula:
*Reference: Yamane, Taro. 1967. Statistics, An Introductory Analysis,2nd Ed.
New York: Harper and Row.
Eastern Cape 1 6 2 45 10 4 1 18 665 31 16 783Free State 1 0 5 25 0 1 0 0 231 30 14 293Gauteng 4 0 11 8 3 4 0 0 323 30 93 383Northern Cape 0 1 1 20 3 0 0 0 83 16 5 124KwaZulu Natal 1 2 13 43 7 4 0 0 524 16 24 610North West 0 5 0 26 0 2 0 0 310 55 7 398Mpumalanga 0 2 3 23 5 0 0 0 209 38 6 280Limpopo 0 2 6 32 1 2 0 0 430 26 3 499Western cape 3 0 8 22 2 5 0 15 358 72 33 485Total 10 18 49 244 31 22 1 33 3133 314 201 3855
Dis
tric
t
Tot
al N
o. H
ealth
F
acili
ties
Pro
vinc
ially
A
ided
Pub
lic/P
rivat
e cl
inic
Spe
cial
ised
- T
B
Spe
cial
ised
-P
sych
iatr
icS
peci
aliz
ed-
Ort
hopa
edic
Nat
iona
l Cen
tral
Pro
vinc
ial
Ter
tiary
Reg
iona
l
chc
priv
ate
hosp
ital
# of Health Facilities per Province
Source: Digital Healthcare Solutions (PTY) LTD. Comprehensive Health Services Information for Southern Africa:
Hospital & Nursing YearBook, 2007.
Sample Size Calculation:
Total number of health facilities in the study: 350
*Reference: Yamane, Taro. 1967. Statistics, An Introductory Analysis,2nd Ed.
New York: Harper and Row.
350)(1
*2
eN
Nn
Sampling Techniques
Multi-Stage Sampling Primary sampling unit Stratification by district (Selection Bias)
Levels of Care Rural/Urban
Sample Proportional Size Sampling Weight:
Total # of health facilities Weighted Sample
Eastern Cape 783 71
Free State 293 27
Gauteng 383 35
Northern Cape 124 11
KwaZulu-Natal 610 55
North West 398 36
Mpumalanga 280 25
Limpopo 499 45
Western cape 485 44
Total 3855 350
Sampling Techniques
Eastern Cape 1 2 1 5 1 1 1 4 47 3 3 1 70Free State 1 0 1 2 0 1 0 0 16 2 3 1 27Gauteng 1 0 2 1 1 1 0 0 18 2 3 8 36Northern Cape 0 1 1 3 1 0 0 0 2 1 1 1 11KwaZulu Natal 1 1 2 4 1 1 0 0 40 2 1 2 55North West 0 1 0 2 0 1 0 0 25 1 5 1 36Mpumalanga 0 1 1 2 1 0 0 0 15 1 3 1 25Limpopo 0 1 2 3 1 1 0 0 34 0 2 1 45Western cape 1 0 2 2 1 1 0 1 24 2 7 3 44Total 5 7 12 24 7 7 1 5 221 14 28 19 350
Wei
gh
ted
Sam
ple
Nat
iona
l Cen
tral
Pro
vinc
ial T
ertia
ry
Reg
iona
l
Dis
tric
t
S
peci
alis
ed-
TB
Spe
cial
ised
-Psy
chia
tric
Spe
cial
ized
-Ort
hopa
edic
priv
ate
hosp
ital
Pro
vinc
ially
Aid
ed
clin
ic
chc
Hos
pice
s
# of Facilities Selected for the study
BIBLIOGRAPHY
Israel GD. (1992) Sampling the evidence of extension program impact. University of Florida IFAS Extension PEOD5. (http://edis.ifas.ufl.edu.)
Israel GD. (1992) Determining Sample Size. University of Florida IFAS Extension PEOD6 (http://edis.ifas.ufl.edu.)
Portney LG and Watkins MP. (2000). Foundations of clinical research – applications to practice. 2nd Ed. Chapter 8 - Sampling
“I have collected a poesy of another man’s roses, and nothing but the thread that binds them together is my own”