sample size

This is the totality of all available units a defined area that falls the scope or interest of the study investigator

Units may be individuals, households, families, schools, communities, villages, insects, hospitals and so forth

population from which the data are actually collected is the survey or study population

If information is required about the health of pre-school children then the population will be that of all children less than 5 years of age

It should be stated whether the result will be valid for the whole country, county or hospitals

Also if a study is on primary health workers in Kenya, then all primary health workers regardless of cadre is the population of interest

A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005)

RESOURCES- TIME, MONEY, PERSONNEL

Study design (prospective or retrospective)

Type of analysis required particularly cross tabulations

Categories of the variables Sample from previous studies by

respectable researchers

Mostly population is infinitely too large to be managed within a reasonable time for the study

Representative (sample) of such population is therefore selected

The process of sampling involves defining the population

What is your population of interest? To whom do you want to generalize your

results? All doctors School children Kenyans Women aged 15-45 years Other

Can you sample the entire population?

3 factors that influence sample representative-ness

Sampling procedure Sample size Participation (response)

When might you sample the entire population? When your population is very small When you have extensive resources When you don’t expect a very high response

9SAMPLING BREAKDOWN

10

TARGET POPULATION

STUDY POPULATION

SAMPLE

Probability (Random) Sampling Simple random sampling

◦ Systematic random sampling◦ Stratified random sampling◦ Multistage sampling◦ Multiphase sampling◦ Cluster sampling

Non-Probability Sampling◦ Convenience sampling◦ Purposive sampling◦ Quota

12

The sampling process comprises several stages:◦Defining the population of concern ◦Specifying a sampling frame, a set of items or

events possible to measure ◦Specifying a sampling method for selecting items

or events from the frame ◦Determining the sample size ◦Implementing the sampling plan ◦Sampling and data collecting ◦Reviewing the sampling process

13

In the most straightforward case, such as the sentencing of a batch of material from production (acceptance sampling by lots), it is possible to identify and measure every single item in the population and to include any one of them in our sample. However, in the more general case this is not possible. There is no way to identify all rats in the set of all rats. Where voting is not compulsory, there is no way to identify which people will actually vote at a forthcoming election (in advance of the election)

As a remedy, we seek a sampling frame which has the property that we can identify every single element and include any in our sample .

The sampling frame must be representative of the population

14

A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined.

. When every element in the population does have the same probability of selection, this is known as an 'equal probability of selection' (EPS) design. Such designs are also referred to as 'self-weighting' because all sampled units are given the same weight.

15

Any sampling method where some elements of population have no chance of selection (these are sometimes referred to as 'out of coverage'/'undercovered'), or where the probability of selection can't be accurately determined. It involves the selection of elements based on assumptions regarding the population of interest, which forms the criteria for selection. Hence, because the selection of elements is nonrandom, non-probability sampling not allows the estimation of sampling errors..

Example: We visit every household in a given street, and interview the first person to answer the door. In any household with more than one occupant, this is a non-probability sample, because some people are more likely to answer the door (e.g. an unemployed person who spends most of their time at home is more likely to answer than an employed housemate who might be at work when the interviewer calls) and it's not practical to calculate these probabilities.

Too few subjects makes estimates unreliable, imprecise, and of low power

Too many subjects is needless waste of resources

Need to strike a balance between cost and precision

Precision- measure of consistency of estimates.

PRIMARY OUTCOME MEASURE ( qualitative or quantitative?)

Smallest effect of interest: How small a difference is to be detected. The magnitude of the effect that is clinically important

and that we do not want to overlook. Significance level: the cut –off level below which

we will reject the null hypothesis i.e. the maximum probability of incorrectly concluding

that there is an effect We usually fix this as n0.05, or occasionally, 0.01 and

reject the null hypothesis if the P value is less than this value

STATISTICAL POWER TO DETECT AN ACTUAL DIFFERENCE

Variability in measurement Study design

Sample size for a single estimate Sample size to compare two means Sample size for a single proportion Sample size for two proportions

N=(Z1-/2 )22

d2

N=(Z1-/2 + Z)22

d2

Prevalence of outcome measure Standard deviation of the variable in the

population if quantitative

Calculation can be done manually with formulas or epi info software package

A health officer wishes to estimate the mean haemoglobin in a defined community. Preliminary information is that this mean is about 150mg/l with a SD of 32mg/l. If a sampling error of up to 5mg/l in the estimate is to be tolerated, how many subjects should be included in the study?

SD=32mg/l D=5mg/l Z=1.96

17625

32*96.12

22

n

n

N=2(Z1-/2 + Z)22

d2

Suppose the prevalence of brucella infection is 2% and the absolute difference to be detected is 0.25% with a 95% confidence, what is the sample size required?

P=0.02% q=1-p=1-0.02% Q=0.98% Z=1.96 D=0.0025%

120470025.0

98.0*02.0*96.12

2

n

n

Suppose investigators want to compare heart rate in patients with essential hypertension and high catecholamine levels with heart rate in patinets with essential hypertension and low catecholamine levels. They are willing to accept a type 1 error(incorrecting concluding that there is a difference in heart rate) of 0.05, and they want a probability of 0.80 of detecting a true difference. The investigators decide a fifference of 10 or more beats per minute is clinically significant, and that an estimate of the SD in heart rate is 15 beats per minute.

Calc SS? Solution N=2{(1.96 + 0.84) (15)}2

10

N= 36 Therefore 36 patients are needed in each group

if the investigators want to have an 80% chance (or 80% power) of detecting a difference of 10 or more beats per minute.

nZ 2 P

c 1 P

c Z P

t1 P

tPc1 P

c

PtPc

2

Z Pc

Pc

Z Pt

Pt

Pc

Pc

PtPc

Study involved a trial of J5 antiserum in surgical patients to determine whether it is effective in preventing gram-negative infections.

Investigators want to estimate the sample size needed to detect a reduction in the proportion of patients who experience shock from the 10% level according to the investigators previous experience (Pc) to 5% or less if patients are given transfusions from donors treated with J5.

They are willing to accept a type 1 error( of falsely concluding that there is a difference when there really is none) of 0.05 and they want a 0.90 probability of detecting of detecting a true difference.

Z=1.96, Z=-1.28, Pc=10%, Pt=5% Therefore: N=(1.306/0.05)2

= 682.46

Sample size increases when;- difference to detect is small.- When power is high.- significance level is low .- Large variation.

THANK YOU