Harmonizing the World’s Census Microdata:
The IPUMS Project
Matt SobekMinnesota Population Center
What is IPUMS-International?
Census data – 1960 to present
Samples – 1 to 10%, nationally representative
Microdata – individual-level
Extract system – select variables – pooled data
Downloadable – anonymized
Integrated – consistent codes across time and place
Map of IPUMS Partners
Dark green = disseminating dataLight green = partners, not yet disseminating
83 countries
Current Countries in IPUMS
44 countries130 samples279 million persons
EgyptGhanaGuineaKenyaRwandaSouth AfricaUganda
ArmeniaCambodiaChinaIndiaIraqIsraelJordanKyrgyz Rep.MalaysiaMongoliaPalestinePhilippinesVietnam
ArgentinaBoliviaBrazilCanadaChileColombiaCosta RicaEcuadorMexicoPanamaUnited StatesVenezuela
AustriaBelarusFranceGreeceHungaryItalyNetherlandsPortugalRomaniaSloveniaSpainUnited Kingdom
Africa Asia Americas Europe
IPUMS MicrodataRelation to head
Marital status Literacy Occupation
Aggregate Data
Data Standardization
Original Input Input Output OutputLabels N Variable Codes Labels Codes Labels
SexMasculin 223,178 1 Male 1 MaleFeminin 234,369 2 Female 2 FemaleNon Declare 290 3 Undeclared 9 Unknown
School63,239 B [no label] 0 NIU (not in universe)
Oui 47,320 1 Yes 1 Attends schoolNon 346,460 2 No 2 Does not attend schoolNon Declare 800 3 Undeclared 9 Unknown
2 5 [no label] 9 "16 9 [no label] 9 "
Data Integration – Marital Status
MARST Marital Status
code label CN82A403 CO73A411 KN89A413 MX70A402 US90A425
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=single 9=single 6=never married
200 MARRIED/IN UNION
210 Married (not specified) 2=married 2=married 3=monogamous 1=married
211 Civil 3=only civil
212 Religious 4=only religious
213 Civil and religious 2=civil and religious
214 Polygamous 3=polygamous
220 Consensual union 1=free union 5=free union
300 SEPARATED/DIVORCED 3=sep. or divorced
310 Separated 6=separated 8=separated 3=separated
321 Legally separated
322 De facto separated
330 Divorced 4=divorced 5=divorced 7=divorced 4=divorced
400 WIDOWED 3=widowed 5=widowed 4=widowed 6=widowed 5=widowed
999 UNKNOWN/MISSING 0=missing 6=unknown B=blank 1=unknown
China1982
Colombia1973
Kenya1989
Mexico1970
U.S.A.1990
XML Harmonization Table<sample>
<id>gh2000a</id>
<rectype>P</rectype>
<svar>GH00A401</svar>
<recode>
<orig>1</orig>
<targ>1000</targ>
<lab>Head</lab>
<freq>347162</freq>
</recode>
<recode>
<orig>3</orig>
<targ>2000</targ>
<lab>Spouse</lab>
<freq>178544</freq>
</recode>
<recode>
<orig>4</orig>
<targ>3000</targ>
<lab>Child</lab>
<freq>707986</freq>
</recode>
Census Questionnaire (Mexico 2000)
Water
Access
5. Number of Rooms
How many rooms are used for sleeping without counting hallways? _____ Write the number
Without counting the hallways or bathrooms how many total rooms are in this dwelling? Count the kitchen
_____Write the number
6. Access to water
Read all of the options until you get an affirmative answer. Circle only one answer
1 Running water inside the dwelling 2 Running water outside the dwelling but on the land 3 Running water from a public faucet or hydrant 4 Running water that is carried from another dwelling 5 Tanked in by truck 6 Water from a well, river, lake, stream or other
Answers 3, 4, 5, 6 continue with number 8
7. Water supply
How many days of the week is water available? Circle only one answer
1 Daily 2 Every third day 3 Twice a week 4 Once a week 5 Occasionally
Text of Census Questionnaire (Mexico 2000)
Water access
XML-Tagged Census Questionnaire (Mexico 2000)
Variable Description (Literacy)
Availability of Selected Person Variables
(Number of samples)
Relationship to head 111 Religion 48
Age 111 Language 26
Sex 111 Ethnicity 36
Marital status 110 Race 19
Age at first marriage 16 School attendance 87
Children ever born 81 Literacy 75
Children surviving 51 Education attainment 100
Mother's mortality status 15 Years of schooling 65
Country of birth 72 Employment status 102
Place of birth 78 Class of worker 103
Citizenship 57 Occupation 100
Year of immigration 18 Industry 99
Migration, international 43 Hours worked weekly 37
Migration, internal 87 Total income 23
Disability 29 Earned income 22
Availability of Selected Household Variables
(Number of samples)
Urban-rural status 75 Electricity 69
Geography, 1st level 101 Water 80
Geography, 2nd level 73 Sewage 70
Home ownership 94 Toilet 76
Number of rooms 86 Cooking fuel 34
Floor material 42 Telephone 49
Wall material 34 Television 42
Roof material 23 Computer 14
Living Area 12 Automobiles 39
Number of Geographic Units Identified
Mexico 2454 Malaysia 133 France 22USA 2071 Ecuador 129 Mongolia 21Brazil 1447 Ghana 110 Italy 19Philippines 1173 Bolivia 84 Armenia 19Colombia 532 India 78 Israel 18Spain 366 Kenya 69 Palestine 16China 347 Vietnam 61 UK 13Argentina 315 Costa Rica 61 Slovenia 13Egypt 278 Kyrgyz 55 Rwanda 12Venezuela 235 Romania 47 Canada 11South Africa 225 Jordan 44 Portugal 7Chile 178 Iraq 44 Belarus 6Uganda 163 Guinea 34 Netherlands 1Greece 154 Panama 31 Hungary 1Cambodia 149 Austria 31
Size of Geographic Units by Country
Mexico 12 Cambodia 73 Iraq 365Colombia 32 Mongolia 87 Romania 414Brazil 41 South Africa 96 Rwanda 731Philippines 41 Malaysia 98 Portugal 776Ecuador 44 Slovenia 100 Canada 963Spain 46 USA 130 Vietnam 1,045Costa Rica 47 Ghana 134 Belarus 1,413Bolivia 48 Uganda 137 France 2,087Venezuela 48 Palestine 138 Italy 2,114Panama 49 Armenia 142 China 2,985Greece 50 Guinea 168 UK 5,143Chile 55 Egypt 195 India 8,635Argentina 61 Austria 225 Hungary 10,210 *Jordan 65 Israel 332 Netherlands 15,986 *Kyrgyz 70 Kenya 359
(Median population in 000s)
Urban Definitions (N of countries)
Administrative divisions 11
Population threshold 10
Population threshold and . . .
Administrative divisions 1
Agglomeration/density 4
Functional criteria 4
Agglomeration/density and . . .
Administrative divisions 2
Functional criteria 3
(Functional criteria include infrastructure, businesses, agriculture, etc.)
National Stats Office
Questionnaire
Data collection
Data processing
Aggregate statistics
Tabulator
Public samples
Full microdata
Samples drawn
Public samples
IPUMS samples
Harmonization
Aggregate statistics
IPUMS
Sampling
Donation
Confidentiality