a computer simulation of household sampling schemes for ......of these sampling schemes on the bias...

10
International Journal of Epidemiology O International Epldemiotoglcal Association 1994 Vol. 23, No. 6 Printed In Great Britain A Computer Simulation of Household Sampling Schemes for Health Surveys in Developing Countries S BENNETT, A RADALOWICZ,* V VELLA" AND A TOMKINS* Bennett S (Tropical Health Epidemiology Unit, London School of Hygiene and Tropical Medicine, Keppel St., London WC1E 7HT, UK), Radalowicz A, Vella V and Tomklns A. A computer simulation of household sampling schemes for health surveys in developing countries. International Journal of Epidemiology 1994; 23: 1282-1291. Background. Cluster sample surveys of health and nutrition in rural areas of developing countries frequently utilize the EPI (Expanded Programme on Immunization) method of selecting households where complete enumeration and systematic or simple random sampling (SRS) is considered impractical. The first household is selected by choosing a random direction from the centre of the community, counting the houses along that route, and picking one at random. Subsequent households are chosen by visiting that house which Is nearest to the preceding one. Methods. Using a computer, and data from a survey of all children in 30 villages in Uganda, we simulated the selection of samples of size 7, 15 and 30 children from each village using SRS, the EPI method, and four different modifications of the EPI method. Results. The choice of sampling scheme for households had very little effect on the precision or bias of estimates of pre- valence of malnutrition, or of recent morbidity, with EPI performing as well as SRS. However, the EPI scheme was inefficient and showed bias for variables relating to child care and for socioeconomic variables. Two of the modified EPI schemes (taking every fifth house and taking separate EPI samples In each quarter of the community) performed in general much better than EPI and almost as well as SRS. Conclusions. These results suggest that the unmodified EPI household sampling scheme may be adequate for rapid appraisal of morbidity prevalence or nutritional status of communities, but that it may not be appropriate for surveys which cover a wider range of topics such as health care, or seek to examine the association of health or nutrition with explanatory factors such as education and socioeconomic status. Other factors such as cost and the ability to monitor interviewers' performance should also be taken into account. Cluster sample surveys are frequently used for the assess- ment of the health status of communities in developing countries. A sample of communities is selected, perhaps in several stages, and within each selected community a sample of households is selected. 1 Selection of house- holds within a community should ideally be at random, and in practice this is most closely achieved by sys- tematic selection from a numbered list of households. In many situations, however, there is no list or map of households available, and if the investigator does not have the resources to completely enumerate and map all the households in the community, some compromise method must be used. * Tropical Health Epidemiology Unit, London School of Hygiene and Tropical Medicine, Keppel St., London WC1E 7HT, UK. •* UNICEF, Uganda Country Office, PO Box 7047, Kampala, Uganda. Current address: SA3PH, The World Bank, 1818 H St. NW, Washington, DC 20433, USA. * Centre for International Child Health, Institute of Child Health, 30 Guilford St., London WC1N 1EH, UK. A common alternative approach is the EPI household sampling scheme, developed by the World Health Organization's Expanded Programme on Immuniza- tion 2 for estimating vaccination coverage. In this procedure, in a rural community, the first household is selected by choosing a random direction from the centre of the community, counting the houses along that route, and picking one at random. If this household contains a child in the target age range (usually 12-23 months) it is included in the sample. The procedure for visiting subsequent households is to choose that house which is nearest to the preceding one. This process continues until the required number of individuals in the target range have been recruited. This sampling procedure is simple to carry out, need- ing no mapping or listing of households, but has the disadvantages that the first house is chosen by a pro- cedure that is somewhat biased, and that the sample is concentrated in one part of the community. These may not be problems if the individuals sought are found only relatively infrequently. A child in a narrow age range 1282

Upload: others

Post on 13-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Computer Simulation of Household Sampling Schemes for ......of these sampling schemes on the bias and precision of estimates of a range of indicators of child health, nutri-tional

International Journal of EpidemiologyO International Epldemiotoglcal Association 1994

Vol. 23, No. 6Printed In Great Britain

A Computer Simulation of HouseholdSampling Schemes for HealthSurveys in Developing CountriesS BENNETT, A RADALOWICZ,* V VELLA" AND A TOMKINS*

Bennett S (Tropical Health Epidemiology Unit, London School of Hygiene and Tropical Medicine, Keppel St., LondonWC1E 7HT, UK), Radalowicz A, Vella V and Tomklns A. A computer simulation of household sampling schemes forhealth surveys in developing countries. International Journal of Epidemiology 1994; 23: 1282-1291.Background. Cluster sample surveys of health and nutrition in rural areas of developing countries frequently utilize theEPI (Expanded Programme on Immunization) method of selecting households where complete enumeration andsystematic or simple random sampling (SRS) is considered impractical. The first household is selected by choosing arandom direction from the centre of the community, counting the houses along that route, and picking one at random.Subsequent households are chosen by visiting that house which Is nearest to the preceding one.Methods. Using a computer, and data from a survey of all children in 30 villages in Uganda, we simulated the selectionof samples of size 7, 15 and 30 children from each village using SRS, the EPI method, and four different modificationsof the EPI method.Results. The choice of sampling scheme for households had very little effect on the precision or bias of estimates of pre-valence of malnutrition, or of recent morbidity, with EPI performing as well as SRS. However, the EPI scheme wasinefficient and showed bias for variables relating to child care and for socioeconomic variables. Two of the modified EPIschemes (taking every fifth house and taking separate EPI samples In each quarter of the community) performed ingeneral much better than EPI and almost as well as SRS.Conclusions. These results suggest that the unmodified EPI household sampling scheme may be adequate for rapidappraisal of morbidity prevalence or nutritional status of communities, but that it may not be appropriate for surveyswhich cover a wider range of topics such as health care, or seek to examine the association of health or nutrition withexplanatory factors such as education and socioeconomic status. Other factors such as cost and the ability to monitorinterviewers' performance should also be taken into account.

Cluster sample surveys are frequently used for the assess-ment of the health status of communities in developingcountries. A sample of communities is selected, perhapsin several stages, and within each selected community asample of households is selected.1 Selection of house-holds within a community should ideally be at random,and in practice this is most closely achieved by sys-tematic selection from a numbered list of households. Inmany situations, however, there is no list or map ofhouseholds available, and if the investigator does nothave the resources to completely enumerate and map allthe households in the community, some compromisemethod must be used.

* Tropical Health Epidemiology Unit, London School of Hygiene andTropical Medicine, Keppel St., London WC1E 7HT, UK.•* UNICEF, Uganda Country Office, PO Box 7047, Kampala, Uganda.Current address: SA3PH, The World Bank, 1818 H St. NW,Washington, DC 20433, USA.* Centre for International Child Health, Institute of Child Health,30 Guilford St., London WC1N 1EH, UK.

A common alternative approach is the EPI householdsampling scheme, developed by the World HealthOrganization's Expanded Programme on Immuniza-tion2 for estimating vaccination coverage. In thisprocedure, in a rural community, the first household isselected by choosing a random direction from the centreof the community, counting the houses along that route,and picking one at random. If this household contains achild in the target age range (usually 12-23 months) itis included in the sample. The procedure for visitingsubsequent households is to choose that house which isnearest to the preceding one. This process continuesuntil the required number of individuals in the targetrange have been recruited.

This sampling procedure is simple to carry out, need-ing no mapping or listing of households, but has thedisadvantages that the first house is chosen by a pro-cedure that is somewhat biased, and that the sample isconcentrated in one part of the community. These maynot be problems if the individuals sought are found onlyrelatively infrequently. A child in a narrow age range

1282

Page 2: A Computer Simulation of Household Sampling Schemes for ......of these sampling schemes on the bias and precision of estimates of a range of indicators of child health, nutri-tional

HOUSEHOLD SAMPLING SCHEMES FOR DEVELOPING COUNTRIES 1283

such as 12-23 months may be found only once in everyfour households, resulting in the sample being spreadmore widely, but a subject who is more common, say achild aged 0—4 years, or a woman aged 15-44, may befound in almost every household, resulting in a samplewhich is very tightly concentrated about the initialhouse.

A sample selected in this way may not be repre-sentative of the entire community. If such bias is con-sistent between communities, it will lead to bias in theoverall estimates of prevalence, coverage etc. If not, itwill lead to a decrease in the precision of these estimates,leading to wider confidence intervals, and to less powerto detect significant differences between subgroups.

The validity of the EPI method has been evaluated forits original purpose of estimating immunizationcoverage,34 but not in other contexts such as maternaland child health, nutritional status or other aspects ofprimary health care (PHC), where the age range of theindividuals studied is likely to be wider. Amendmentsto the scheme aimed at making it more representativehave been suggested15 such as taking every fifth house,dividing each community into quarters and selecting aseparate EPI sample from each one, or taking part of thesample from the centre of the community and part fromthe periphery.6

In this paper we use computer simulation to select re-peated samples from a rural Ugandan population usingsimple random sampling (SRS), EPI sampling and fouradaptations of the EPI scheme. We evaluate the effectof these sampling schemes on the bias and precision ofestimates of a range of indicators of child health, nutri-tional status, health care and associated socioeconomicfactors. We discuss the implications of our findings forthe conduct of future surveys of this kind. A review andmore general appraisal of the EPI method has appearedelsewhere.7

METHODSThe Survey and the DataIn March and April 1988 a baseline survey was carriedout in Mbarara District in south-west Uganda in prep-aration for a UNICEF/Uganda Ministry of Health PHCproject in that area. Thirty communities (villages) wereselected and a complete census of each village taken. Ahousehold was defined as all those sharing a commoncooking pot. These villages contained a total of 2532households, ranging from 51 to 153 per village, and4320 children under 5 years (range 86-238 per village)of whom data on 4129 (96%) were included in thisstudy. For each household, information was collectedon socioeconomic factors, water supply and sanitation

etc. Each child under 5 years was measured for weight,length (<36 months) or height (>36 months), and mid-upper arm circumference (MUAC) according to stand-ardized methods, and his or her morbidity for the past 2weeks was recorded. The prevalence of malnutrition, mor-bidity and socioeconomic indicators are described else-where.8 All houses in the village were accurately mapped.9

The variables used in the computer simulation areshown in Table 1. For the purpose of this study eachvariable has been treated simply as a dichotomy. Notmore than 6% of the data were missing for any variable.

The Sampling SchemesIn immunization coverage surveys2 seven children aresampled from each community. To represent this designand the larger sample sizes used elsewhere, we tooksamples of size 7, 15 and 30 children from each com-munity. The following schemes for the selection ofhouseholds within each community were simulated:

(i) SRS: Simple random sampling. Each house in thecommunity is given a unique number from 1 to n(where n is the total number of houses). A sample ofhouses is then selected using a table of randomnumbers.

(ii) EPI: The EPI method.(a) Selection of the first household: The investigator

stands at a central point in the community andchooses a direction at random (e.g. by spinninga pen in the air and seeing how it lands). S/hecounts the houses between the centre and theedge of the community in that direction. One ofthese houses is selected at random.

(b) Selection of subsequent households: The inves-tigator chooses the house whose door is nearestto the door of the household s/he has just left.

(iii) EPI3: The first household is selected by the EPIstrategy (ii,a) above. Subsequent houses are selectedby choosing a random direction and selecting thethird nearest house in that direction.

(iv) EPI5: As (iii), but the fifth nearest household isselected.

(v) QTR: The community is divided into four quad-rants; the EPI strategy (ii) is then used indepen-dently to select a quarter of the sample from eachquadrant, starting at a central point in each quadrant.

(vi) PERI: Half of the sample is selected at the centre ofthe community and half at the periphery. A randomdirection is taken from the centre, and the first housein that direction is visited. Half of the sample isselected by visiting in turn the nearest households asin (ii,b). The investigator then returns to the centre,chooses again a random direction and visits the last

Page 3: A Computer Simulation of Household Sampling Schemes for ......of these sampling schemes on the bias and precision of estimates of a range of indicators of child health, nutri-tional

1284 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

TABLE 1 Variables considered in the simulation study

Variable Meaning Prevalence

NutritionHeight for ageWeight for ageWeight for heightMUAC

MorbidityFeverDiarrhoeaRespiratory infection

Health careBreasfedPregnantWeighed in last 3 monthsGrowth chart availableInterpret chart

ScoioeconomicMother's educationFather's educationEthnic groupReligionSubsistence fannerKeeps cattleGrows a cash crop

Height for age z-score <-2.0Weight for age z-score <-2.0Weight for height z-score <-2.0Mid-upper arm circumference <13.5cm

Child had fever in previous 2 weeksChild had diarrhoea in previous 2 weeksChild had respiratory infection in previous 2 weeks

Child is not currently breastfedMother is pregnantChild was not weighed in previous 3 monthsGrowth chart is not availableMother cannot interpret growth chart

Mother has £1 years of educationFather has £1 years of educationFather's ethnic group is not BanyankoleFather is ProtestantFather is a subsistence farmerFather keeps cattleFather grows a crop for sale

32.018.13.7

20.7

7.98.8

16.2

66515.692.464.887.8

50.474.319.954.941.919.612.2

house in that direction. The remainder of the sampleis selected, again as in (ii,b).

If there was more than one child aged 5 in a house-hold, then all were included in the sample. Householdswere sampled from the community until the requiredsample size (in terms of children) was achieved orsurpassed. Each sampling scheme was simulated 1000times for each sample size.

Measures of EffectivenessFor each simulation of the sampling procedure, we esti-mated the sample prevalence of each of the attributeslisted in Table 1. We used the following measures4 tosummarize the performance of the various samplingschemes:

BIAS: The mean value of the sample prevalence overall 1000 simulations minus the expected prevalenceunder SRS. (In taking an unweighted SRS of equal sizefrom each community, the expected prevalence is themean of the 30 community population prevalences.)VARIANCE: The sample variance of the 1000 samplemeans. This estimates the expected variance of themean of a single sample, and its square root estimatesthe standard error of the mean.

MSE: Mean square error, equal to bias2 + variance; ameasure of the total error.HDEFF: 'Household design effect'; the ratio of the vari-ance for the given sampling scheme to the correspond-ing variance achieved under SRS of households. Thiswill be one component of the 'design effect'10 (deff)which measures the increase in variance of thecomplete sampling design, including stratification andcluster sampling, compared to an SRS of individualstaken from the entire region. Confidence intervals forthe population mean will be wider by a factor Ahdeffthan if households had been sampled by SRS, andsample sizes should be larger by a factor hdeff tocompensate for this.

Values of bias, variance and MSE in the Tables are pre-sented on the same percentage scale as the prevalencesin Table 1.

Implementation of the Sampling SchemesMaps of each village were digitized manually, and thex,y co-ordinates of each household added to its record.Random numbers were generated from the computer'sinternal clock. Sampling was without replacement, sothat no household could appear twice in the same sample.

Page 4: A Computer Simulation of Household Sampling Schemes for ......of these sampling schemes on the bias and precision of estimates of a range of indicators of child health, nutri-tional

HOUSEHOLD SAMPLING SCHEMES FOR DEVELOPING COUNTRIES 1285

To select the first household in the EPI and associatedschemes the centre of the village was defined by themedian x and y co-ordinates of all the houses in thevillage. From this point a random direction was gener-ated. All houses within a fixed short distance of a linedrawn in this direction were considered to be on the'path'. The number of houses on this path was counted,and one selected at random. Subsequent householdswere chosen by selecting that house which was nearest,and which had not been selected before in this sample.

RESULTSFor reasons of space, results are shown only for samplesizes 7 and 30; those for sample size 15 are mentionedin the text.

Variance and Household Design EffectTable 2 shows that for sample size 7, there is littledifference in variance between SRS, EPI3, EPI5 andQTR, with most of the hdeffs very close to one. Thevariances of the PERI scheme are much reducedcompared to SRS, 10 of the 19 variables having hdeffsof less than 0.8. However for EPI sampling high hdeffsare shown for a few variables, notably ability tointerpret a growth chart and father's ethnic group, theformer showing an increase in variance of 36%. Hdeffsfor nutritional variables are mostly less than one for theEPI scheme.

For sample size 30 (Table 3), the hdeffs for EPI3,EPI5 and QTR are again very close to unity (except forpossession of a growth chart). EPI sampling appears tobe more efficient than SRS for measuring nutritionalstatus and morbidity, but its inefficiency is increasedfor other variables, with hdeffover 1.2 for six variables.The hdeffs for PERI are still low, but higher than forsample size 7. Untypically, EPI sampling providesparticularly low variances for the prevalence ofbreastfeeding in all sample sizes.

The picture for sample size 15 (data not shown) issimilar to that for sample size 30. Variances are evenlarger for EPI, with seven variables having hdeff over1.2 (1.40 for interpretation of growth chart), and closerto unity for nutritional and morbidity variables.Variances are again small for PERI, and the othersampling schemes again have variances close to thoseof SRS, except for possession of a growth chart.

EPI5 generally has lower variances than EPI3 for thelarger sample sizes, with QTR looking the best of thethree when the sample size increases to 30.

For no sample size, nor for any sampling scheme,are large hdeffs seen for any of the nutritional status

variables (height for age, weight for age, weight forheight and mid-upper arm circumference) or for morbidity(fever, diarrhoea or respiratory infection in previous twoweeks), mother's or father's education, breastfeeding orpregnancy status. It is socioeconomic and culturalvariables (being a subsistence farmer, growing a cashcrop, father's religion and ethnic group) and health carevariables (whether child had been weighed recently,availability and interpretation of growth chart) that areaffected most by the choice of sampling scheme.

Bias and Mean Square ErrorTable 4 shows bias and mean square error for the small-est sample size. The PERI scheme frequently leads toconsiderable bias, the magnitude being over 3% forpossession of cattle, and frequently over 2% (note thatthis is an absolute, not relative, percentage bias). Thebias of SRS is close to zero and for EPI3, EPI5 andQTR is almsot always less than 1%, but EPI leads to abias of more than 1% for seven variables. The biases ofall the sampling schemes become slightly smaller forsample size 15 (data not shown), but large biases arestill seen for many variables with PERI, and for somevariables with EPI. For sample size 30 (Table 5) biasesare smaller again, although a few high values are seenfor PERI and EPI.

Biases are most extreme for possession of cattle, andare also high for father's ethnic group, religion and levelof education, and for mother's education and being asubsistence fanner for the smaller sample sizes. Inter-estingly, PERI, EPI, EPI3 and to a lesser extent EPI5,overestimate the prevalence of low height for age what-ever the sample size.

The effect of bias and variance is combined in thenotion of mean square error (MSE). For sample size 7(Table 4) EPI sampling shows a high MSE for inter-pretation of a growth chart, father's ethnic group, relig-ion and level of education, being a subsistence farmerand keeping cattle. The combination of low varianceand high bias for PERI sometimes results in a low MSE(weight for age, height for age, breastfeeding and avail-ability of a growth chart), and sometimes in a high MSE(fever, ethnic group, pregnancy and keeping cattle).The MSE for EPI3, EPI5 and QTR are broadly similarto those for SRS.

For sample size 15 (data not shown), EPI samplingshows MSE more than 20% above that of SRS for nineof the 19 variables, and PERI for eight. EPI3 (with 5),EPI5 (3) and QTR (2) also show some high values. Lowvalues are almost non-existent. The picture for samplesize 30 is very similar (Table 5), but note the very largeMSE for religion and keeping cattle with EPI comparedto those with SRS.

Page 5: A Computer Simulation of Household Sampling Schemes for ......of these sampling schemes on the bias and precision of estimates of a range of indicators of child health, nutri-tional

1286 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

TABLE 2 Variance of different sampling schemes with sample size 7, and (below) household design effect (hdeff), or ratio to srs variance

NutritionHeight for age

Weight for age

Weight for height

Mid-upper armcircumference

MorbidityFever

Diarrhoea

Respiratory infection

Health careBreastfed

Pregnant

Weighed in last 3 montns

Growth chart available

Interpret chart

SocioeconomicMother's education

Father's education

Ethnic group

Religion

Subsistence farmer

Keeps cattle

Grows a cash crop

SRS

11.9

7.5

1.8

6.7

3.5

3.8

7.6

7.8

9.1

4.8

13.5

7.8

17.3

15.1

9.1

15.8

18.7

11.9

8.3

EPI

10.50.886.60.881.50.84

7.11.06

3.61.023.50.948.11.06

5.40.69*9.71.075.51.14

14.71.09

10.61.36"

17.81.03

15.51.03

11.31.25"

18.41.16

21.11.13

11.30.958.31.00

Sampling scheme

EPI3

11.10.93771.031.81.00

7.81.17

3 81.093.40.917.71.02

7.00.919.61.064 91.02

13.61.018.11.04

18.81.09

14.50.969.00.99

17.31.10

19.51.04

11.70.987.30.88

EPI5

11.10.937.40.981.81.00

6.81.02

3 81.093.50.946.80.90

7.20.938.70.964.50.94

13.91.027.40.94

18.41.06

15.41.028.40.92

15.70.99

20.01.07

12.91.098.51.03

QTR

10.70.907.20.961.40.81

7.01.05

3.20.923.40.916 90.92

7.10.928.20904.10.85

15.01.117.3094

17.91.03

13.90.929.61.06

17.51.11

18.71.00

11.40.977.60.92

PERI

8.60.72c

5.20 69s

1.10.60*

6.00.91

2.90.813.10.826.90.91

4.70.61°7.40813.60.75°

10.90.806.40 82

11.10.64c

11.70.78c

6.90.76c

12.70.80

14.90.80*8.10.68c

6.80.82

* For an explanation of the sampling schemes see text.bhdeff>\.2c hdeff <0.8

Page 6: A Computer Simulation of Household Sampling Schemes for ......of these sampling schemes on the bias and precision of estimates of a range of indicators of child health, nutri-tional

HOUSEHOLD SAMPUNG SCHEMES FOR DEVELOPING COUNTRIES 1287

TABLE 3 Variance of different sampling schemes with sample size 30, and (below) household design effect fhdeff,), or ratio to srs variance

NutritionHeight for age

Weight for age

Weight for height

Mid-upper armcircumference

MorbidityFever

Diarrhoea

Respiratory infection

Health careBreastfed

Pregnant

Weighed in last 3 months

Growth chart available

Interpret chart

SocioeconomicMother's education

Father's education

Ethnic group

Religion

Subsistence farmer

Keeps cattle

Grows a cash crop

SRS

2.2

1.5

0.34

1.4

0.81

0.69

1.5

1.5

1.7

0.94

2.6

1 6

3.6

2.9

1.8

3.4

3.7

2.4

1.6

EPI

1.90.891.40.920.280.82

1.20.83

0.600.75c

0.560.811.50.96

1.20.811.70.991.151 22"3.51.33"2.11.29"

3.50.972.80.932.21.23"4.41.30"4.41.202.91.182.11.30"

Sampling scheme1

EPI3

2.31.071.40.970.341.00

1.51.05

0.730.910.620.901.61.06

1 51.001.91.101.001 063.51.32"1.71.06

3.91.073.21.082.01.143.81.134.41.202.81.121.91.18

EPI5

2.21.001.40.980.330.96

1.41.00

0.690.860.620.891.40.92

1.40.931.91 120.930.993.11.171.71.01

3.71.032.80.941.81.003.30 983.81.042.51.011.81.10

QTR

2 10.991.40.960.280 83

1.41.01

0.720.900.670.971.30.85

1.10.73c

1.60.930.941 003.21.21"1.91.15

3.50.962.70.922.01.153.61.063.40.932.51.031.60.95

PERI

1.80.831.10 74c

0.250.74°

1.20.85

0.640.80°0.530.77c

1.30.86

0.90.61°1.60.930.951.012.91.121.81.09

3.00 822.60.871.91.074 01.194.21.142.30.921.71.01

* For an explanation of the sampling schemes see text.b hdeff>\.2c hdeff<0.8

Page 7: A Computer Simulation of Household Sampling Schemes for ......of these sampling schemes on the bias and precision of estimates of a range of indicators of child health, nutri-tional

1 2 8 8 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

TABLE 4 Bias (upper) and mean square error (lower) for sample size 7

NutritionHeight for age

Weight for age

Weight for height

Mid-upper armcircumference

MorbidityFever

Diarrhoea

Respiratory infection

Health careBreastfed

Pregnant

Weighed in last 3 months

Growth chart available

Interpret chart

SocioeconomicMother's education

Father's education

Ethnic group

Religion

Subsistence farmer

Keeps cattle

Grows a cash crop

SRS

0.0111.9-0.06

7.50.051 8

-0.036.6

-0.023.50.103.80.007.6

0.097.70.119.10.044.80.26

13.60.147.8

0.0117.3-0.0515.10.119.1

-0.1215.80.01

18.70.02

11.9-0.00

8.3

EPI

1.2512.10.176.60.231.5

-0.657.5

-0.724.1

-0.323.6

-0.228.1

0.876.1C

-0.8410.4-0.14

5.5-0.3814.9-0.7011.1"

1.2319.3

1.9019.1b

2.6218.2"-1.7921.6b

1.1922.5"-3.0120.3"-0.01

8.3

Sampling scheme'

EPI3

1.5313.40.578.10.291.8

0.207 8

-0.394.0

-0.173.4

-0.277.8

0.027.0

-0.279 7

-0.555.2

-0.6814.1-0.15

8.1

-0.0318.80.63

14.80.699.5

-fl.7017.8

1.3221.3-1.5314.0-0.93

8.2

EPI5

0.8211.80.577.70.462.0

-0.196.8

-0.364.00.093.5

-0.256.9

0.337 3

-0.188.7

-0.254.6

-0.9014.70.177.4

0.0118.40.78

16.00.668.8

-0.6816.1

1.1821.4-1.4415.0"-0.39

8.7

QTR

0.0310.70.347.3

-0.201.5

-0.367.1

-0.103.30.093.40.127.0

-0.087.10418.30.454.31.34

16.8b

1.158.6

-0.9318.70.26

14.00.059.60.38

17.70.57

19.0-0.0511.50.648.0

PERI

1.119.9

-0.20

s.r-0.23

l . l c

-0.096.0

-1.194.3"

-0.363.2

-0.246.9

-0.034.7*

-2.1211.9"-0.96

4.50.04

10.9°-0.77

7.0

2.1615.72.49

17.92.69

14.2"-0.7913.3

1.9018.5-3.4520.0"-1.67

9.5

* For an explanation of the sampling schemes see text." MSE>1.2xMSEof SRSc MSE <0.8 x MSE of SRS

Page 8: A Computer Simulation of Household Sampling Schemes for ......of these sampling schemes on the bias and precision of estimates of a range of indicators of child health, nutri-tional

HOUSEHOLD SAMPUNG SCHEMES FOR DEVELOPING COUNTRIES

TABLE 5 Bias (upper) and mean square error (lower) for sample size 30

1289

NutritionHeight for age

Weight for age

Weight for height

Mid-upper armcircumference

MorbidityFever

Diarrhoea

Respiratory infection

Health careBreastfed

Pregnant

Weighed in last 3 months

Growth chart available

Interpret chart

SocioeconomicMother's education

Father's education

Ethnic group

Religion

Subsistence farmer

Keeps cattle

Grows a cash crop

SRS

0.082.20.021.50.000.34

0.041.4

0.000.81

-0.020.690.001.5

-0.021.5

-0.021.70.020940.122.7

-0.011.6

-0.073.6

-0.153.0

-0.071.80.093.40.093.70.072.5

-0.181.7

EPI

0.812.60.081.40.100.29

-0.371.3

-0.060.61c

-0.831.26"

-0.581.8

0.401.40.191.70.381.30"

-0.183.5"0.262 2"

0.343.61.254.3"0.552.5"

-1.938.1"0.094.4

-2.499.1"

-0.132.1"

Sampling scheme1

EPI3

0.722.8"0.321.50.170.37

-0.051.5

0.080.74

-0.130.63

-0.341.8

0.291.60.221.90.031.00

-0.413.7"0.051.7

-0.203.90.473.40.162 0

-0.163.90.704.9"

-0.893.5"

-0.292.0"

EPI5

0.442.40.371.60.250.39

0021.4

-0.080.700.090.62

-0 401.6

0.161.40.242.00.200.97

-0.29320.211.7

-0.213.80.753.30.432 0

-0.503.60.824.5"

-1.003.5"0.051.8

QTR

-0.382.3

-0.331.5

-0.100.29

-0.782.0"

0.090.730.030.67

-0.031.3

0.261.2C

-0.001.6

-0.010.940.583.5"0.692.4"

-0.804.10.242.8

-0.202.0

-0.493.8

-0.713.9

-0.392.70.872.3"

PERI

1.273.4"0.351.20.320.35

0.541.5

0.060.64c

-0.470.75

-0.131.3

0.811.60.401.8

-0.100.96

-0.233.0

-0.492.0"

-0.163.02.287.7"1 133.2"

-1.416.0"0.144.2

-2.8510.4"-0.54

2.0

1 For an explanation of the sampling schemes see text.bMSE>1.2xMSEofSRScMSE<0.8xMSEofSRS

Page 9: A Computer Simulation of Household Sampling Schemes for ......of these sampling schemes on the bias and precision of estimates of a range of indicators of child health, nutri-tional

1290 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

Proportion outside RangeThe proportions of estimates which lie outside a givenrange (± 10%, ± 5%, ± 1%) centred on the true valuewere also calculated (data not shown) and confirmthe pattern shown by the MSE, with PERI and EPIfaring particularly poorly for socioeconomic, cultural,educational and health care variables.

DISCUSSIONThe EPI household sampling scheme was designed aspart of a rapid and economical survey procedure whoseaim was to estimate vaccination coverage in childrenwithin a narrow age range. It has since been adopted formany other purposes, and a number of modificationshave been proposed and evaluations carried out.7 Simu-lation of household selection from artifical populationsby Lemeshow et al.A has indicated that the scheme isless efficient than SRS for realistic models of thedistribution of population and of vaccine coverage,although it may perform adequately in its objective ofestimation to within 10 percentage points. In this simu-lation generated from real data from 30 communities inUganda we have tested the properties of EPI samplingand four variants of it for a range of variables com-monly studied in surveys of health and nutritionalstatus, for children aged 0—4 years.There are important aspects of the comparison of these

sampling schemes that are not amenable to study bycomputer simulation. If the population estimates usedfor selecting communities with probability proportionalto size are out of date, the unweighted survey estimatesmay be biased. The complete enumeration of house-holds necessary for a systematic or simple randomsample will enable this to be corrected, by weighting oradjusting the number of households taken. Selection ofspecified households from a numbered list also enablessupervisors to monitor the work of interviewers by re-visiting a subsample of households. This is much moredifficult in the EPI scheme. Set against this is the costsaving in not carrying out mapping and enumeration.

The simulations are limited in their capacity torepresent the topography of the villages: in practice thechoice of random direction to follow would be limitedby the roads and paths that existed, and by the existenceof barriers such as fences, streams etc. Nor could thesimulations consider other practical realities such asinterviewer motivation, time constraints, or the viewsof the head of the community.

A clear feature of the results is that the choice ofhousehold sampling scheme has little effect on vari-ables representing nutritional status of children andtheir recent morbidity, except that the EPI and PERI

schemes overestimate (by up to 1% or more) the pro-portion with low height for age. In estimating edu-cational variables and mother's pregnancy status thereis evidence of bias with both PERI and EPI, particularlywith the smaller sample sizes.

The EPI scheme has a very large variance comparedwith SRS for the variables relating to child care(whether weighed or not, presence and understandingof growth chart), particularly for the larger samplesizes. For such variables EPI5 performs as well as SRS,with the other schemes somewhere in between. For theother socioeconomic variables (religion, ethnic groupsand variables relating to farming) EPI performs poorly,both in terms of variance and bias. The lower variancesachieved by PERI are more than outweighed by thelarge biases seen with this method. EPI5 and QTR per-form as well as SRS, and EPI3 not quite as well.

In general, household design effects increase with thenumber of households sampled, while the bias de-creases a little.

The difference in the performance of the samplingschemes for different variables depends on the way inwhich these variables are distributed within com-munities. With variables which show a gradient fromthe centre to the edge of the community, such askeeping cattle, the EPI and the PERI schemes will yielda similar bias in every community, leading to an overallbias, since they both oversample a small group ofhouses at the centre of the community. (The EPImethod does this because, in selecting the first house-hold, an imaginary radius sweeping 360° round thecommunity will pass over fewer houses near the centrethan at the periphery.) A bias which differs betweencommunities will lead to increased variance. Thusvariance of the EPI scheme increases for variables suchas ethnic group which are likely to cluster within eachcommunity: the sample may be wholly within a groupin some villages, and wholly outside it in others.

With the PERI scheme, households at the centre of thecommunity are included in every sample, while house-holds halfway between the centre and the edge of thecommunity will not be included in any sample. Thislimitation in the range of possible samples is not onlythe cause of its large bias, but also of the smaller vari-ance. The risk of bias outweighs the low variance andsuggests that schemes such as this should not be used.

The large hdeffs of the EPI scheme for educationallevels, mother's pregnancy status, and variables relat-ing to child care, indicate the presence of within-com-munity clustering for these variables, and reflect thehigh design effects seen for complete surveys forsocioeconomic factors and factors related to healthcare.'-" The low hdeffs shown by the morbidity and

Page 10: A Computer Simulation of Household Sampling Schemes for ......of these sampling schemes on the bias and precision of estimates of a range of indicators of child health, nutri-tional

HOUSEHOLD SAMPLING SCHEMES FOR DEVELOPING COUNTRIES 1291

nutrition variables for all sampling schemes indicatethat neither cases of common illnesses, nor childrenwith poor nutritional status, form clusters withincommunities, being distributed rather randomly. Such ahomogeneity has been observed in the relatively lowwhole-survey design effects for prevalence of commonillnesses from surveys in Guinea12 and Maldives (WLiyanage, unpublished MSc thesis), but contrasts withthe high design effects for epidemic diseases such asmeasles.13

The sample size necessary under SRS sampling ofhouseholds, assuming that one has already taken ac-count of cluster sampling of communities, will need tobe multiplied by a factor hdeff to obtain the same pre-cision with EPI sampling of households. For a surveywhose main interest is estimating levels of nutritionalstatus and general morbidity, no increase in sample sizeis necessary. For surveys whose primary concern isto investigate child care practices and socioeconomicstatus, the sample size will need to be increased bybetween 20% and 40%, as shown by the values of hdeff'.Whether the bias of 1-2% seen with the EPI scheme isimportant will depend on the objectives of the survey.

In studying all children aged 0—4 years we examinedthe effect of each sampling scheme in a situation wherealmost every household would be eligible for inclusionin the sample. Use of a narrower age range would entailvisiting more households in order to find the requiredsample size. Experience shows that to find sevenchildren aged 12-23 months may entail visiting 20 ormore households, so that a sample taken using the EPIscheme may result in a child being selected fromapproximately every fourth household. The EPI samplewould thus have the properties of the EPI3 or EPI5scheme, and as we have shown, would be likely to per-form as well as SRS.

The results of these simulations suggest that the EPIscheme, which is simple and quick, is suitable for rapidappraisal of the prevalence of morbidity or the nutri-tional status of communities where practical reasonspreclude the use of systematic or simple randomsampling. However, where surveys cover a wider rangeof topics, such as health care, or where a survey seeksto examine the association of health or nutritional statuswith explanatory variables such as education or socio-economic factors, the unmodified EPI scheme will beinefficient and somewhat biased. Adapting the schemeby taking every fifth house, or by splitting the com-munity into quarters and carrying out the EPI scheme ineach quarter, gives results as good as SRS and muchbetter than the unmodified EPI scheme in such cases.The extra work involved is small: a little more walking

and, if sampling in quarters, the repetition four times ofthe selection of the starting household. There is a needfor field studies to evaluate both the costs and benefitsof such adaptations, and the costs and additionalbenefits (interviewer control, corrected populationestimates) of complete enumeration.

ACKNOWLEDGEMENTSThis work was supported by UNICEF, Uganda. S Bennettand A Radalowicz are supported by the UK MedicalResearch Council.

REFERENCES1 Bennett S, Woods A J, Liyanage W M, Smith D L. A simplified

general method for cluster sample surveys of health indeveloping countries. World Health Stat Q 1991; 44.98-106.

2 World Health Organization. Training for Mid-level Managers:The EPI Coverage Survey. Geneva: WHO ExpandedProgramme on Immunization, 1991, WHO/EPI/MLM/91.10.

3 Henderson R H, Sundaresan T. Cluster sampling to assessimmunization coverage: a review of experience with asimplified sampling method. Bull World Health Organ1982; 60: 253-60.

* Lemeshow S, Tserkovnyi A G, Tulloch J L, Dowd J E, LwangaS K, Keja J. A computer simulation of the EPI surveystrategy Int. J Epidemiol. 1985; 14: 473-81.

3 El Bindan-Hammad A, Smith D L. Primary Health CareReviews. Guidelines and Methods. Geneva: WHO, 1989.

6 D'Alessandro U, Aikins M K, Langerock P, Bennett S,Greenwood B M. Nationwide survey of bednet usein rural Gambia. Bull World Health Organ 1994; 72:391-94.

7 Bennett S. The EPI cluster sampling method: a critical appraisal.Bull Int Statist Inst 1993; 55: Book 2, 21-35.

8 Vella V, Tomkms A, Nidku J, Marshall T. Determinants of childmortality in south-west Uganda. J Biosoc Sci 1992; 24:103-12.

9 Vella V. An Epidemiological Analysis of Predictors ofChildhood Malnutrition and Mortality in SouthwestUganda. London: University of London, 1991. PhD thesis.

10 Kish L. Survey Sampling. London: Wiley, 1965." Kish L, Groves R M, Krotki K P. Sampling Errors for Fertility

Surveys. London: World Fertility Survey, 1976. Occasionalpaper no. 17.

12 Dabis F, Breman J G, Roisin A J, Haba F, the ASCI-CCCDteam. Monitoring selective components of primary healthcare: methodology and community assessment of vaccin-ation, diarrhoea, and malaria practices in Conakry, Guinea.Bull World Health Organ 1989; 67: 675-84.

13 Rothenberg R B, Lobanov A, Singh K B Stroh G. Observationson the application of EPI cluster survey methods forestimating disease incidence. Bull World Health Organ1985; 63: 93-99.

(Revised version received June 1994)