Download - On Samples And Sampling

“On Samples And Sampling”

Title drawn from Elisabeth Kubler-Ross’ anagramatic phraseologies: “On death and dying”; “On grief and grieving”; “Real taste of life (On life and living)”

Ehi IgumborSchool of Public Health

University of the Western Cape

“On Samples”What is Research all about? Gathering data, information or

evidence about a subject or topic What is the outcome of HIV-associated

tuberculosis in the era of HAART?

But how much data, information or evidence? Should every HIV-associated TB case on

HAART be used? Should only 2 be left out? Should 50% of them be used?

Populations and Samples

Population Larger group to which research results are

generalized Defined aggregate of persons, objects, or

events that meet a specified set of criteria

Sample Sub-group of the population Serves as reference group for estimating

characteristics of or drawing conclusions about the population

Populations and Samples

Population (Group about whom you wish to gather data defined by person, place and time)

Sample(Sub-group of total study population)

Why use a sample? Save time Save money Save energy Not practical to get everyone Less data so limit error (fewer

opportunities to make mistake)– improved quality

Why Not? - Just as good!

But… Sampling Bias!

Are responses of sample members representative of the population?

No way to guarantee, but good sampling procedures help

Not so much size as representativity: Gallup and Harris polls predicted Nixon win using

2000 voters (43% predicted, 42.9% result) 1936 Literary Digest poll predicted Alf Landon win by

57% based on 2million voters from list of automobile owners and telephone directories

Sampling Bias

Occurs when individuals selected over- or under- represent certain population attributes that are related to the phenomenon under study

May be Conscious or Unconscious

Learning Objectives

Understand strategies for selecting a sample

Understand how to determine the required size of a sample

“On Sampling” –Determining Sampling Procedure What do I want to know?

Does self-reported quality of life of patients with HIV-associated tuberculosis improve after HAART compared to before HAART?

Is the CD4 count in patients on HAART different from those not on HAART?

May involve simply comparing 2 indicators or more rigorous analysis of changes in HAART and not in HAART to estimate the strength of the impact of HAART

Determining Sampling Procedure

What is my Population? Need a good problem statement Everyone affected (may be geographical,

demographical, economic, social, or other specific content of study)

Should not be too narrow Sometimes source of data is different from

sampling unit e.g household surveys


Remember Populations are not necessarily restricted human subjects: May include people, places, organizations,

objects, animals, days or any unit of interest. E.g Blood samples in an epidemiology study Housing units in a household survey Series of measurements in a test-retest reliability

study Inventory of manufactured products in industrial

quality control studies

Target Population and Accessible Population

Study of motor skills Target or reference population:

“ALL children with learning disabilities in South Africa today”

Accessible or experimental population “ALL children identified as having a learning

disability in Cape Town’s school system”

Inclusion and Exclusion Criteria

Inclusion Criteria: primary traits of the target and accessible populations that will qualify someone as a subject

Exclusion Criteria: factors that would preclude someone from being studied. (Are potentially confounding to the results)


To sample or Not to sample? Is it feasible to use population? ?Cost ?Time Sometimes “census” of all needed

Small population size Useful to know information on every individual Scope of study: rapid assessment or in-depth

investigation

Types of Samples

Sampling Procedure Non-probability

Selection of samples is made by nonrandom methods i.e not based on chance

No way to accurately estimate chance of inclusion/degree of sampling error

Is convenient and economical Quality depends on knowledge, judgment

and expertise of researcher

Non-Probability Samples

Haphazard Sampling

No conscious planning or consistent procedures are employed to select the sample units


Convenience or “accidental” Sampling

A unit is self-selected (e.g volunteers) or easily accessible/available

E.g consecutive sampling of patients Although may yield useful information, caution with

making inferences!


Quota Sampling

A pre-determined number of units which have certain characteristics are selected

Controls for confounding effect of known characteristics of a population by selecting adequate numbers from each stratum

E.g “50 men and 50 women to be interviewed on a busy street”

Non-Probability SamplesSnowball Samples

Useful if hard to locate subjects with specific characteristics

Carried in stages: Select a few subjects who meet selection criteria Ask selected subjects to identify others who have requisite

characteristics Repeat process of “chain referral” or “snowballing” till

adequate sample size obtained

Non-Probability SamplesPurposive or judgment Sampling

Researcher handpicks subjects on basis of specific characteristics or attributes that are important to the research study

Units used sometimes EXTREME or CRITICAL units May be most useful to pre-test an instrument for a larger

study or in qualitative studies to ensure subjects have appropriate knowledge and will be good informants for the study

Probability Samples Every element in the population has a known, nonzero

probability of selection

Because probability is known, can be generalized (at least within a given level of precision) to the larger population

Risk of incorrectly generalizing to larger population less, thus better than non-probability samples

Sampling Frame A list of units or elements from which the sample is

to be selected Should list every element separately, once and

only once, and nothing else appears on the list Common Problems:

Missing elements, non-coverage or incomplete frame Blanks or foreign elements Duplicate listings Clusters of elements combined into one listing

Sampling Frame

What do you do if a “poor” Sampling Frame?

BEFORE SELECTING SAMPLE:

Ignore or disregard the problem

Redefine population to fit sampling frame

Spend time and effort to fix the frame

What do you do if a “poor” Sampling Frame?

Missing elements: Use supplementary methods. Eg active fieldwork to get homeless individuals in a household based survey

Foreign elements: Omit if identified

Duplicate elements: Select first, last, current listing Any unique feature?

Clusters: Use all. Or randomly select one

Probability Samples- Simple Random

Easiest and least complex Equal chance for each element Using table of random numbers:

Assign a number to each element in list Select a starting point Determine number of columns to use Select numbers from table Discard any duplicate you select Select numbers until obtain desired sample size

Probability Samples- Simple Random

Probability Samples- Stratified Random

Improves on estimates of simple random by random sampling population in strata

3 types: Proportionate Disproportionate or Optimal Equal size

Probability Samples- Stratified Random

Probability Samples- Systematic Samples

Select first element randomly and then every nth element on the list afterwards

Starting point will be a number between 1 and 10 randomly drawn from a table of random numbers

Gives each element equal (but not independent) chance

Useful if you do not have a list when elements are arranged in space e.g house selection

Probability Samples- Systematic Samples

Probability Samples- Cluster or Area Sample

A method of selecting sample units in which the unit contains a cluster of elements

The probability of selecting an element is a product of the probabilities of selecting its cluster

Different from stratified in that ideally, elements are heterogenous. (In stratified they are homogenous)

NB: In practice though, clusters tend to be homogenous

Probability Samples- Cluster or Area Sample

PUTTING IT TOGETHER- SELECTING A SAMPLING DESIGN

Multi-faceted process Depends on

Amount of information available about population If characteristics known – stratified random If little known – less complex simple or systematic When list unavailable – cluster ALSO combined: Stratified multi-staged

cluster sampling

Determine the type of sampling used

A soccer coach selects 6 players from a group of boys aged 8 to 10, 7 players from a group of boys aged 11 to 12, and 3 players from a group of boys aged 13 to 14 to form a recreational soccer team.


A soccer coach selects 6 players from a group of boys aged 8 to 10, 7 players from a group of boys aged 11 to 12, and 3 players from a group of boys aged 13 to 14 to form a recreational soccer team.

Stratified


A pollster interviews all human resource personnel in five different high tech companies.


A pollster interviews all human resource personnel in five different high tech companies.

Cluster


An engineering researcher interviews 50 women engineers and 50 men engineers.


An engineering researcher interviews 50 women engineers and 50 men engineers.

Stratified


A medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital.


A medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital.

Systematic


A high school counselor uses a computer to generate 50 random numbers and then picks students whose names correspond to the numbers.


A high school counselor uses a computer to generate 50 random numbers and then picks students whose names correspond to the numbers.

Simple random


A student interviews classmates in his algebra class to determine how many pairs of jeans a student owns, on the average.


A student interviews classmates in his algebra class to determine how many pairs of jeans a student owns, on the average.

Convenience

Suppose UWC has 10,000 part-time students (the population). We are interested in the average amount of money a part-time student spends on books in an academic year. Asking all 10,000 students is an almost impossible task. Suppose we take two different samples.

First, we use convenience sampling and survey 10 students from a first semester Masters in Public Health class. Many of these students have been attending the 2009 Summer School and taking elective course on Epidemiology and biostatistics in addition to their MPH core courses . The amount of money they spend is as follows:R128; R87; R173; R116; R130; R204; R147; R189; R93; R153

The second sample is taken by using a list from the Division of Life Long Learning unit of adult learners who take part-time classes and taking every 5th student on the list, for a total of 10 students. They spend:

R50; R40; R36; R15; R50; R100; R40; R53;

R22; R22

Problem 1

Do you think that either of these samples is representative of (or is characteristic of) the entire10,000 part-time student population?

Problem 2

Since these samples are not representative of the entire population, is it wise to use the results to describe the entire population?

Now, suppose we take a third sample. We choose ten different part-time students from all disciplines which offer part-time studies (Public Health, Physio, EMS, etc). Each student is chosen using simple random sampling. Using a calculator, random numbers are generated and a student from a particular discipline is selected if he/she has a corresponding number. The students spend:

R180; R50; R150; R85; R260; R75; R180; R200; R200; R150

Do you think this sample is representative of the population?

Problem 3

Learning Objectives

Understand strategies for selecting a sample

Understand how to determine the required size of a sample

Sample Size Determination

Determined by: Purpose of study Population size Risk of selecting a “bad” sample Allowable sampling error

Sample Size Criteria

Level of precision

Level of confidence or risk

Degree of variability

Level of Precision

Also called “Sampling error”

Range in which the true value of the population is estimated to be

So, 42% (+/- 2%): 40% - 44%

Confidence Level

Also called “Risk level”

Based on principle of Central Limit Theorem

95% CI – 95 out of 100 samples will have the true population value within the range of precision specified

Confidence Level

Chance that sample you obtain does not represent the true population value is shown in shaded area

Risk reduces for 99% CI and increases for 90% CI

Degree of Variability Distribution of attributes

Heterogenous – bigger sample Homogenous – smaller sample

Note that 50% indicates a greater level of variability than 20% and 80%

0.5 is mostly used in conservative samples because it indicates maximum variability

Strategies for determining Sample Size

Using a Census for small populations

Using a Sample Size of a Similar Study

Using Published Tables

Using Formula to Calculate a Sample Size

Using a Census for small populations

Use entire population as sample May be useful in Small population cost

permitting (<200) Why use this?

Eliminates sampling error Provides individual level data “Fixed costs” eg of questionnaire design etc Virtually entire population would have to be in

sample in small populations anyway

Using a Sample Size of a Similar Study

Could be a valuable approach

But without reviewing the procedures employed, may run risk of repeating errors made previously

Review literature to get guidance on “typical” sample size

Using Published Tables Use published tables which provide sample

size for a given set of criteria

Sample sizes in tables reflect the number of OBTAINED responses (not necessarily the number of surveys mailed)

Assumptions of normality in distribution

Using Formulas to Calculate A Sample Size

Equation 2: (Snedecor & Cochran 1989)

22

2

dd

qpqpCn eecc

Equation 1: (Fleiss 1981)

2

21

d

sCn

2)(1*

eN

Nn

Equation 3: (Yamane’s 1967)

Other Considerations Assumes simple random sampling

Number needed for data analysis (eg multiple regression analysis, log linear analysis require a bigger sample than if simple descriptive analysis)

Sample size increased by 30% to compensate for non-response; 10% to compensate for persons unable to reach

Calculation Using Computer Programmes

Epi Info

Online Softwares: eg Rao Soft

EXAMPLE: Sample Size Calculation

2)(1*

eN

Nn

Where n = Sample size N = Population size e = Level of precision or Sampling of Error which is ±5%

Yamane’s formula:

*Reference: Yamane, Taro. 1967. Statistics, An Introductory Analysis,2nd Ed.

New York: Harper and Row.

Eastern Cape 1 6 2 45 10 4 1 18 665 31 16 783Free State 1 0 5 25 0 1 0 0 231 30 14 293Gauteng 4 0 11 8 3 4 0 0 323 30 93 383Northern Cape 0 1 1 20 3 0 0 0 83 16 5 124KwaZulu Natal 1 2 13 43 7 4 0 0 524 16 24 610North West 0 5 0 26 0 2 0 0 310 55 7 398Mpumalanga 0 2 3 23 5 0 0 0 209 38 6 280Limpopo 0 2 6 32 1 2 0 0 430 26 3 499Western cape 3 0 8 22 2 5 0 15 358 72 33 485Total 10 18 49 244 31 22 1 33 3133 314 201 3855

Dis

tric

t

Tot

al N

o. H

ealth

F

acili

ties

Pro

vinc

ially

A

ided

Pub

lic/P

rivat

e cl

inic

Spe

cial

ised

- T

B

Spe

cial

ised

-P

sych

iatr

icS

peci

aliz

ed-

Ort

hopa

edic

Nat

iona

l Cen

tral

Pro

vinc

ial

Ter

tiary

Reg

iona

l

chc

priv

ate

hosp

ital

# of Health Facilities per Province

Source: Digital Healthcare Solutions (PTY) LTD. Comprehensive Health Services Information for Southern Africa:

Hospital & Nursing YearBook, 2007.

Sample Size Calculation:

Total number of health facilities in the study: 350

*Reference: Yamane, Taro. 1967. Statistics, An Introductory Analysis,2nd Ed.

New York: Harper and Row.

350)(1

*2

eN

Nn

Sampling Techniques

Multi-Stage Sampling Primary sampling unit Stratification by district (Selection Bias)

Levels of Care Rural/Urban

Sample Proportional Size Sampling Weight:

Total # of health facilities Weighted Sample

Eastern Cape 783 71

Free State 293 27

Gauteng 383 35

Northern Cape 124 11

KwaZulu-Natal 610 55

North West 398 36

Mpumalanga 280 25

Limpopo 499 45

Western cape 485 44

Total 3855 350

Sampling Techniques

Eastern Cape 1 2 1 5 1 1 1 4 47 3 3 1 70Free State 1 0 1 2 0 1 0 0 16 2 3 1 27Gauteng 1 0 2 1 1 1 0 0 18 2 3 8 36Northern Cape 0 1 1 3 1 0 0 0 2 1 1 1 11KwaZulu Natal 1 1 2 4 1 1 0 0 40 2 1 2 55North West 0 1 0 2 0 1 0 0 25 1 5 1 36Mpumalanga 0 1 1 2 1 0 0 0 15 1 3 1 25Limpopo 0 1 2 3 1 1 0 0 34 0 2 1 45Western cape 1 0 2 2 1 1 0 1 24 2 7 3 44Total 5 7 12 24 7 7 1 5 221 14 28 19 350

Wei

gh

ted

Sam

ple

Nat

iona

l Cen

tral

Pro

vinc

ial T

ertia

ry

Reg

iona

l

Dis

tric

t

S

peci

alis

ed-

TB

Spe

cial

ised

-Psy

chia

tric

Spe

cial

ized

-Ort

hopa

edic

priv

ate

hosp

ital

Pro

vinc

ially

Aid

ed

clin

ic

chc

Hos

pice

s

# of Facilities Selected for the study

BIBLIOGRAPHY

Israel GD. (1992) Sampling the evidence of extension program impact. University of Florida IFAS Extension PEOD5. (http://edis.ifas.ufl.edu.)

Israel GD. (1992) Determining Sample Size. University of Florida IFAS Extension PEOD6 (http://edis.ifas.ufl.edu.)

Portney LG and Watkins MP. (2000). Foundations of clinical research – applications to practice. 2nd Ed. Chapter 8 - Sampling

“I have collected a poesy of another man’s roses, and nothing but the thread that binds them together is my own”

Download - On Samples And Sampling

Top Related