and evaluation program results from the - census.gov · nancy w. osbourn, arona l. pistiner, ronald...

83
Census 2000 Synthesis Report No. 16 TR-16 Issued March 2004 Results From the Administrative Records Experiment in 2000 U.S. Department of Commerce Economics and Statistics Administration U.S. CENSUS BUREAU Census 2000 Testing, Experimentation, and Evaluation Program

Upload: trinhdieu

Post on 19-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Census 2000 Synthesis Report No. 16TR-16

Issued March 2004

Results From theAdministrativeRecordsExperimentin 2000

U.S.Department of CommerceEconomics and Statistics Administration

U.S. CENSUS BUREAU

Census 2000 Testing, Experimentation, and Evaluation Program

Page 2: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

The Census 2000 Evaluations Executive SteeringCommittee provided oversight for the Census 2000Testing, Experimentation, and Evaluations (TXE)Program. Members included Cynthia Z. F. Clark,Associate Director for Methodology and Standards;Preston J. Waite, Associate Director for DecennialCensus; Carol M. Van Horn, Chief of Staff; TeresaAngueira, Chief of the Decennial ManagementDivision; Robert E. Fay III, Senior MathematicalStatistician; Howard R. Hogan, (former) Chief of theDecennial Statistical Studies Division; Ruth AnnKillion, Chief of the Planning, Research and EvaluationDivision; Susan M. Miskura, (former) Chief of theDecennial Management Division; Rajendra P. Singh,Chief of the Decennial Statistical Studies Division;Elizabeth Ann Martin, Senior Survey Methodologist;Alan R. Tupek, Chief of the Demographic StatisticalMethods Division; Deborah E. Bolton, AssistantDivision Chief for Program Coordination of thePlanning, Research and Evaluation Division; Jon R.Clark, Assistant Division Chief for Census Design ofthe Decennial Statistical Studies Division; David L.Hubble, (former) Assistant Division Chief forEvaluations of the Planning, Research and EvaluationDivision; Fay F. Nash, (former) Assistant Division Chieffor Statistical Design/Special Census Programs of theDecennial Management Division; James B. Treat,Assistant Division Chief for Evaluations of the Planning,Research and Evaluation Division; and VioletaVazquez of the Decennial Management Division.

As an integral part of the Census 2000 TXE Program,the Evaluations Executive Steering Committee char-tered a team to develop and administer the Census2000 Quality Assurance Process for reports. Past andpresent members of this team include: Deborah E.Bolton, Assistant Division Chief for ProgramCoordination of the Planning, Research and EvaluationDivision; Jon R. Clark, Assistant Division Chief forCensus Design of the Decennial Statistical StudiesDivision; David L. Hubble, (former) Assistant DivisionChief for Evaluations and James B. Treat, AssistantDivision Chief for Evaluations of the Planning, Researchand Evaluation Division; Florence H. Abramson,Linda S. Brudvig, Jason D. Machowski, andRandall J. Neugebauer of the Planning, Research and Evaluation Division; Violeta Vazquez of theDecennial Management Division; and Frank A.Vitrano (formerly) of the Planning, Research andEvaluation Division.

The Census 2000 TXE Program was coordinated by thePlanning, Research and Evaluation Division: Ruth AnnKillion, Division Chief; Deborah E. Bolton, AssistantDivision Chief; and Randall J. Neugebauer andGeorge Francis Train III, Staff Group Leaders. KeithA. Bennett, Linda S. Brudvig, Kathleen HaysGuevara, Christine Louise Hough, Jason D.Machowski, Monica Parrott Jones, Joyce A. Price,Tammie M. Shanks, Kevin A. Shaw,

George A. Sledge, Mary Ann Sykes, and CassandraH. Thomas provided coordination support. FlorenceH. Abramson provided editorial review.

This report was prepared by Barry V. Bye (formerly)and Dean H. Judson of the Planning, Research andEvaluation Division. The following authors and projectmanagers prepared Census 2000 experiments andevaluations that contributed to this report:

Planning, Research and Evaluation Division:D. Mark BauderMichael A. BerningRalph H. CookHarley K. HeimovitzDean H. Judson

The authors would like to recognize the following keycontributors to the Administrative Records Experimentin 2000: Bashiruddin Ahmed, Mikhail Z. Batkhan,D. Mark Bauder, Michael A. Berning, HaroldBobbitt, Barry V. Bye, Benita Dawson, JosephConklin, Katherine D. Conklin, Gary B. Chappell,Ralph H. Cook, Ann H. Daniele, Matthew R.Falkenstein, Eleni E. Franklin, James E. Farber,Mark Gorsak, Harley K. Heimovitz, FredHolloman, David Hilnbrand, David L. Hubble,Robert J. Jeffrey, Dean H. Judson, Norman L.Kaplan, Vickie L. Kee, Francina Kerr, Jeong SookKim, Myoung Ouk Kim, Charlene A. Leggieri,John F. Long, John Lukasiewicz, Mark Moran,Daniella Mungo, Esther R. Miller, Tamany Mulder,Nancy W. Osbourn, Arona L. Pistiner, Ronald C.Prevost, Dean M. Resnick, Pamela Ricks, Paul L.Riley, Douglas Sater, Doug Scheffler, Kevin A.Shaw, Kevin M. Shaw, Larry D. Sink, DianeSimmons, Amy Symens-Smith, Cotty ArmstrongSmith, Herbert Thompson, Deborah A. Wagner,Phyllis Walton, Signe I. Wetrogan, David L. Word,Mary F. Untch, and members of the AdministrativeRecords Experiment 2000 Implementation Group.

Greg Carroll and Everett L. Dove of the Admin-istrative and Customer Services Division, and WalterC. Odom, Chief, provided publications and printingmanagement, graphic design and composition, and edi-torial review for print and electronic media. Generaldirection and production management were providedby James R. Clark, Assistant Division Chief, andSusan L. Rappa, Chief, Publications Services Branch.

Acknowledgments

Page 3: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Department of CommerceDonald L. Evans,

Secretary

Vacant,Deputy Secretary

Economics and Statistics AdministrationKathleen B. Cooper,

Under Secretary for Economic Affairs

U.S. CENSUS BUREAUCharles Louis Kincannon,

Director

Census 2000 Synthesis Report No. 16Census 2000 Testing, Experimentation,

and Evaluation Program

RESULTS FROM THEADMINISTRATIVE RECORDS

EXPERIMENT IN 2000

TR-16

Issued March 2004

Page 4: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Suggested Citation

Barry V. Bye and Dean H. JudsonCensus 2000 Testing,

Experimentation, and Evaluation ProgramSynthesis Report No. 16, TR-16, Results From the Administrative

Records Experiment in 2000, U. S. Census Bureau,

Washington, DC 20233ECONOMICS

AND STATISTICS

ADMINISTRATION

Economics and StatisticsAdministration

Kathleen B. Cooper,Under Secretary for Economic Affairs

U.S. CENSUS BUREAU

Charles Louis Kincannon,Director

Hermann Habermann,Deputy Director and Chief Operating Officer

Cynthia Z. F. Clark,Associate Director for Methodology and Standards

Preston J. Waite, Associate Director for Decennial Census

Teresa Angueira, Chief, Decennial Management Division

Ruth Ann Killion, Chief, Planning, Research and Evaluation Division

For sale by the Superintendent of Documents, U.S. Government Printing Office

Internet: bookstore.gpo.gov Phone: toll-free 866-512-1800; DC area 202-512-1800

Fax: 202-512-2250 Mail: Stop SSOP, Washington, DC 20402-0001

Page 5: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 iii

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vii

Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71.2 Administrative Record Census–definition and

requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71.3 Administrative Records Experiment objectives . . . . . . . .81.4 Administrative Records Experiment top-down and

bottom-up methods . . . . . . . . . . . . . . . . . . . . . . . . . . .81.5 Experimental sites . . . . . . . . . . . . . . . . . . . . . . . . . . . .91.6 Administrative Records Experiment source files . . . . . .101.7 Administrative Records Experiment evaluations . . . . . .101.8 Limitations of the experiment . . . . . . . . . . . . . . . . . . .10

2. The Administrative Records Experiment Process Evaluation . .132.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .132.2 Top-down enumeration . . . . . . . . . . . . . . . . . . . . . . . .142.3 Bottom-up enumeration . . . . . . . . . . . . . . . . . . . . . . .17

3. Administrative Records Experiment Outcomes and HouseholdEvaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .253.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .253.2 Administrative Records Experiment enumeration

outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .253.3 Household-level analysis . . . . . . . . . . . . . . . . . . . . . . .43

4. Discussion and Recommendations for 2010 Planning and Other Census Bureau Programs . . . . . . . . . . . . . . . . . . . . . .614.1 Additional Administrative Records Experiment 2000

evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .614.2 Race and Hispanic origin enhanced Numident . . . . . . .624.3 Household substitution for nonresponse followup/

unclassified households in 2010 . . . . . . . . . . . . . . . . .634.4 Person unduplication in the 2010 Census . . . . . . . . . .644.5 Master Address File improvement . . . . . . . . . . . . . . . .654.6 Other Census Bureau programs . . . . . . . . . . . . . . . . . .65

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67

Attachment 1. Administrative Records Experiment 2000 Implementation Flow Chart . . . . . . . . . . . . . . . . .69

Attachment 2. StARS Process Steps – Outline . . . . . . . . . . . . . . .70

Contents

Page 6: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

iv Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

LIST OF TABLES

Table 1. Source file characteristics . . . . . . . . . . . . . . . . . . . . .13

Table 2. Reference dates of source files . . . . . . . . . . . . . . . . . .14

Table 3. Percent of cases with imputed race or Hispanic origin by age and county . . . . . . . . . . . . . . . . . . . . . . . . . . .15

Table 4. StARS 1999 and AREX test site geocoding tallies . . . . .15

Table 5. AREX administrative record address geocoding results 16

Table 6. Top-down population tallies . . . . . . . . . . . . . . . . . . . .17

Table 7. Addresses eligible for the match to the DMAF . . . . . . .18

Table 8. Computer match results – administrative records counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

Table 9. Clerical review match results . . . . . . . . . . . . . . . . . . .18

Table 10. Selection of FAV addresses . . . . . . . . . . . . . . . . . . . . .19

Table 11. FAV sample results . . . . . . . . . . . . . . . . . . . . . . . . . .19

Table 12. Bottom-up administrative records address processing .20

Table 13. Census pull results . . . . . . . . . . . . . . . . . . . . . . . . . .21

Table 14. Bottom up method population tallies . . . . . . . . . . . . .21

Table 15. Geographic differences for persons in both the top-down and bottom-up methods . . . . . . . . . . . . . . .22

Table 16. Bottom-up “best” address by FAV status . . . . . . . . . . .23

Table 17. Top-down and bottom-up counts of total household population by county . . . . . . . . . . . . . . . . . . . . . . . .27

Table 18. Coverage by AREX of census housing units . . . . . . . . .45

Table 19. Coverage by AREX of census housing units, by NRFU status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

Table 20. Coverage by AREX of census housing units, by imputation status . . . . . . . . . . . . . . . . . . . . . . . . . . .46

Table 21. Distributions of household size for Census and AREX for all five AREX counties. Nonvacant housing units only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46

Table 22. Comparison of Census and AREX household size, by NRFU status, and by imputation status. For linked housing units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47

Table 23. Comparisons between AREX and Census for demographicgroups, for linked households with the same number of people only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47

Table 24. Comparison of AREX and Census demographic composition of households. For linked households with the same number of people only, by size . . . . . . . . . .48

Page 7: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Table 25. Comparison of AREX and Census demographic groups within households. For linked households with the same number of people only, by size . . . . . . . . . . . . .48

Table 26. Comparison of match rates and household comparisons between occupied housing units at multi-unit BSAs and housing units at single-unit BSAs . . . . . . . . . . . . . . . .49

Table 27. Coverage by multi vs. single unit, and by household age characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . .49

Table 28. AREX to census comparisons by size of housing unit and by household age characteristics . . . . . . . . . . . . .50

Table 29. The effect of the presence of persons other than white in a household on household match rates and comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . .51

Table 30. The effect of the presence of Hispanics on household match rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52

Table 31. The effect of AREX imputed race on household comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52

Table 32. Overall response profile for the “match” variable . . . . .54

Table 33. Goodness-of-fit measures for the logistic regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54

Table 34. Maximum likelihood parameter estimates, standard errors, and approximate tests . . . . . . . . . . . . . . . . . .55

Table 35. Classification results for predicted probabilities. . . . . .56

Table 36. Summary of match rates and household comparisons between AREX and Census . . . . . . . . . . . . . . . . . . . . .58

LIST OF FIGURES

Figure 1. Summary diagram of AREX 2000 design . . . . . . . . . . . .9

Figure 2. Net population difference by sex, county, and collection method–CO . . . . . . . . . . . . . . . . . . . . . . . .27

Figure 3. Net population difference by sex, county, and collection method–MD . . . . . . . . . . . . . . . . . . . . . . . .28

Figure 4. Net population difference by age, county and collection method–MD . . . . . . . . . . . . . . . . . . . . . . . .29

Figure 5. Net population difference by age, county and collection method–CO . . . . . . . . . . . . . . . . . . . . . . . .30

Figure 6. Net population difference by race, county and collection method–MD . . . . . . . . . . . . . . . . . . . . . . . .31

Figure 7. Net population difference by race, county and collection method–CO . . . . . . . . . . . . . . . . . . . . . . . .32

Figure 8. Sex ALPE by county and collection method–MD . . . . . .33

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 v

Page 8: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Figure 9. Sex ALPE by county and collection method–CO . . . . . .34

Figure 10. Age ALPE by county and collection method–MD . . . . .35

Figure 11. Age ALPE by county and collection method–CO . . . . . .36

Figure 12. Race ALPE by county and collection method–MD . . . . .37

Figure 13. Race ALPE by county and collection method–CO . . . . .38

Figure 14. Distribution of tracts with under- and overcounts oftotal population . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39

Figure 15. Distribution of blocks with under- and overcounts of total population . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40

Figure 16. AREX - Census total population ALPEs: Maryland tracts 41

Figure 17. AREX - Census ALPEs for the total population: Colorado tracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42

Figure 18. Goodness of fit diagnostic plot . . . . . . . . . . . . . . . . .57

vi Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Page 9: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 vii

The Census 2000 Testing, Experimentation, and Evaluation Programprovides measures of effectiveness for the Census 2000 design,operations, systems, and processes and provides information on the value of new or different methodologies. By providing measuresof how well Census 2000 was conducted, this program fully sup-ports the Census Bureau’s strategy to integrate the 2010 planningprocess with ongoing Master Address File/TIGER enhancements andthe American Community Survey. The purpose of the report that follows is to integrate findings and provide context and backgroundfor interpretation of related Census 2000 evaluations, experiments,and other assessments to make recommendations for planning the 2010 Census. Census 2000 Testing, Experimentation, andEvaluation reports are available on the Census Bureau’s Internet siteat: www.census.gov/pred/www/.

Foreword

Page 10: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

This page intentionally left blank.

Page 11: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

The Administrative RecordsExperiment 2000 was an experi-ment in two areas of the countrydesigned to gain informationregarding the feasibility of con-ducting an administrative recordscensus or the use of administrativerecords in support of conventionaldecennial census processes. Thefirst experiment of its kind in theUnited States, the AdministrativeRecords Experiment 2000 was partof the Census 2000 Testing,Experimentation and EvaluationProgram. The focus of this pro-gram was to measure the effective-ness of new techniques for decen-nial census enumeration. Therewere four evaluations: Process,Outcomes, Household, andRequest for Physical Address evalu-ations. The first three are summa-rized here.

Administrative RecordsCensus definition andrequirements

In the Administrative RecordsExperiment, an administrativerecord census was defined as aprocess that relies primarily, butnot necessarily exclusively, onadministrative records to producethe population count and contentof the decennial census short formwith a strong focus on apportion-ment and redistricting require-ments. In addition to total popula-tion counts by state, the decennialcensus must provide counts of thevoting age (18 and over) popula-tion by race and Hispanic origin forsmall geographic areas, currentlyin the form of census blocks.

Demographically, theAdministrative Records Experimentprovided date of birth, race,Hispanic origin, and sex.Geographically, the AdministrativeRecords Experiment operated atthe level of basic street addressand corresponding Census blockcode. Unit numbers for multi-unitdwellings were used in certainaddress matching operations andone of the evaluations; but gener-ally household and family composi-tion were not captured. Thedesign did assume the existence ofa Master Address File and geo-graphic coding capability similar tothat available for the Census 2000.

The principal objectives ofAdministrative Records Experiment2000 were twofold. The firstobjective was to develop and com-pare two methods for conductingan administrative records census,one that used only administrativerecords and a second that addedsome conventional support to theprocess in order to complete theenumeration. The second objectivewas to explore the potential use ofadministrative records data forsome nonresponding or unclassi-fied households that occur in aconventional census.

Administrative RecordsExperiment top-down andbottom-up methods

A two-phase process accomplishedthe Administrative RecordsExperiment 2000 enumeration.The first, or Top-down, phaseinvolved the assembly of recordsfrom a number of national adminis-trative record systems and undupli-

cation of individuals within thecombined systems. This was fol-lowed by computer geocoding ofstreet addresses to the level ofcensus block and two attempts toobtain and code physical addressesfor those that would not geocodeby computer. Finally, there was aselection of “best” demographiccharacteristics for each individualand “best” street address withinthe experimental sites.

The second phase of theAdministrative Records Experiment2000 design was an attempt tocomplete the administrative-records-only enumeration by thecorrection of errors in administra-tive records addresses throughaddress verification (a coverageimprovement analogue) and byadding persons missed in theadministrative records (a nonre-sponse followup analogue).Considering the Top-down andBottom-up processes as part of oneoverall design, the AdministrativeRecords Experiment can bethought of as a prototype for amore or less conventional censuswith the initial mailout replaced bya Top-down administrative recordsenumeration.

Limitations

There were four principal limita-tions on the experiment.

• The administrative recordssource files were limited tothose used in the creation of theStatistical AdministrativeRecords System 1999, whichrelied primarily on files for taxyear 1998 and other files

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 1

Executive Summary

Page 12: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

extracted early in calendar year

1999. These files neither

exhausted the national-level

administrative records that

might have been available for

Administrative Records

Experiment 2000 nor were they

the timeliest with respect to

April 1, 2000, Census Day for

Census 2000.

• The number of experimental

sites was small. Although it

would not have been reasonable

or realistic to attempt to mount

this first Administrative Records

Experiment in a representative

sample of geographic areas

large enough to make national

estimates, additional sites

would have provided more con-

fidence that the results were not

idiosyncratic to the sites select-

ed.

• There was no experimental vari-

ation in key design parameters

such as the clerical and field

operations and the address

selection algorithm. Without

some factorial or fractional fac-

torial structure, direct estimates

of operational impacts of com-

ponents, individually or in com-

bination, were not possible.

• The measurement of race and

Hispanic origin in administrative

records at the national level is

deficient. Attempts were made

to improve the measurement

through the use of certain statis-

tical models, but the results

were not entirely satisfactory.

The limitations in the

Administrative Records Experiment

were largely due to resource con-

straints and a short planning peri-

od for what was an extremely com-

plex and novel undertaking.

Experimental sites

Two sites were selected that werebelieved to have a total of approxi-mately one million housing unitsand a population of approximatelytwo million persons. One siteincluded Baltimore City andBaltimore County, Maryland. Theother site included Douglas, ElPaso and Jefferson Counties,Colorado. The sites provided amix of population and housingcharacteristics needed to assessthe difficulties that might arise inconducting an administrativerecords census.

Administrative RecordsExperiment outcomesevaluation

As expected, the Bottom-up cover-age is much improved compared tothe Top-down, and this is largelydue to the completion of the Top-down enumeration by using cen-sus data for nonmatched address-es, which simulates a followup tothe administrative records enumer-ation. Specifically, the Bottom-upcoverage of children (81 percent -94 percent across the test sites) issubstantially better than the Top-down (72 percent - 83 percent).Coverage of children is a particularweakness for administrativerecords used in AdministrativeRecords Experiment 2000.

Adults in the Bottom-up are moreor less uniformly overcounted (102percent - 104 percent). The over-count of adults most likely is dueto unaccounted for deaths in the12 months prior to Census Day,the lack of special populationsoperations in the AdministrativeRecords Experiment (e.g., a groupquarters enumeration), and failureto unduplicate persons afteradding census data for non-matched addresses. Of course, thelatter means that there is someduplication of children as well.

Detailed enumeration resultsfocused mainly on a comparison ofthe Bottom-up enumeration withthe Census 2000. The analysis didnot include group quarters and,due to limitations in the adminis-trative records sources, personscould not be reported with “multi”or “other” race. The analysis pro-gressed from large geographicareas to small geographic areas,beginning with the five test sitecounties and ending with censusblocks within the sites. The evalu-ation incorporated a variety ofmethods to accomplish its objec-tives, including univariate and mul-tivariate statistical analyses of theAdministrative RecordsExperiment/Census 2000 differ-ences, and spatial/ecological mapsthat examined the geographic dis-tributions of key comparison meas-ures. The outcomes evaluationtried to disentangle the influenceof demographic change andAdministrative Records Experimentprocessing, coverage and dataquality issues, while presentingbasic enumeration statistics.

At the county level, the Bottom-upprocess undercounted total popula-tion in all sites except BaltimoreCity. As with the total population,males and females were under-counted in all sites exceptBaltimore City, but the femaleundercounts were slightly greaterthan male undercounts. Agegroups showed more variabilitywith most groups undercounted.Generally the size of the under-counts increased with decreasingage, except for the 20-24 agegroup. These patterns did notappear to be site-specific.Overcounts for the oldest old andundercounts for the youngest per-sons suggest that much more time-ly birth and death informationmust be obtained. Also, the spe-cial enumeration requirements for

2 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Page 13: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

populations such as college stu-

dents, the military and persons in

nursing homes must be incorporat-

ed into administrative records

processes.

Administrative records are not cur-

rently a good source of data for

race and Hispanic origin, and the

models were not sufficient to cor-

rect their deficiencies. Blacks and

Hispanics were undercounted

when they were a large minority

group and overcounted when they

were not. American Indians and

Alaskan natives were not well iden-

tified and the accuracy of

Asian/Pacific Islander counts was

uncertain.

Bottom-up tract-level total popula-

tion results indicated a good corre-

spondence between Administrative

Records Experiment and the cen-

sus. The population counts of 70

percent of tracts were within 5 per-

cent points, and 95 percent of the

tracts were within 25 percentage

points, though a sizable number of

tracts had moderate and large

undercounts. At the block-level,

population counts were the least

accurate. For the total population

38 percent of blocks met the 5

percent criterion and about 85 per-

cent of blocks met the 25 percent

criterion.

A multivariate analysis of block dif-

ferences showed that large under-

counts were associated with such

block characteristics as high popu-

lation density, high rental rates,

and large proportions of persons

age 20-24. Large overcounts were

associated with high vacancy rates,

low population density, small pro-

portions of persons under the age

of 20 and large proportions of per-

sons age 20-24 and age 65 and

over (Heimovitz, 2002).

Administrative RecordsExperiment household-levelanalysis

The general goal of the household-level analysis was to assess howwell households formed fromadministrative records matchedthose from Census 2000 address-es. The evaluation focused, first,on the factors associated withAdministrative Records Experimentand Census 2000 addresses thatwere (computer) linked. Then,demographic comparisons weremade between households atlinked addresses. There was aspecial focus on Census 2000households that required a nonre-sponse followup and Census 2000unclassified (imputed) households.

The evaluation used both descrip-tive analyses and logistic regres-sion analysis to assess the cover-age and accuracy of AdministrativeRecords Experiment households.Descriptive analyses were per-formed for households in all fiveAdministrative Records Experimentcounties and for the Census 2000nonresponse followup and imputedhouseholds in the test sites. Alogistic regression model wasdeveloped to predict the probabili-ty of an accurate household matchusing address and AdministrativeRecords Experiment processingcharacteristics as predictors.Addresses with a high probabilityof correct demographic matchbetween occupants might be can-didates for administrative recordssubstitution in the case of nonre-sponse followup in a conventionalcensus. In the following discus-sion the term “linked” is used tomean a matched address. Theterm “matched” is reserved forhousehold demographic compar-isons at linked addresses.

Administrative RecordsExperiment’s coverage of the cen-

sus nonresponse followup universewas not as good as its coverage ofthe overall universe.Administrative Records Experimenthousing units were linked with70.9 percent of the census nonre-sponse followup housing units,compared with 88.4 percent of thecensus responding housing units.For occupied nonresponse fol-lowup housing units, the coveragerate was 76.7 percent. TheAdministrative Records Experimenthousing units were linked with63.2 percent of households thatwere imputed to have people inthem, and 34.7 percent of thoseimputed to be vacant.

The Administrative RecordsExperiment and the census count-ed the same number of people inthe housing unit for 51.1 percentof the 889,638 linked households,and Administrative RecordsExperiment was within one of thecensus for 79.4 percent of theunits. The 51.1 percent is effec-tively a ceiling on the percent oflinked households that had exactlythe same persons fromAdministrative Records Experimentand Census 2000. Although errorsin address linkage would accountfor some of the mismatchedhouseholds, the deficiencies inadministrative records cited earlierin this report–missing children,lack of special population opera-tions and the time gap betweenthe administrative records extractsand Census Day–most likelyaccount for the major part.

For linked nonresponse followuphousing units, AdministrativeRecords Experiment had the samenumbers of persons for 37.0 per-cent of the units and was withinone 69.3 percent of the time.Census 2000 nonresponse fol-lowup housing units were moresusceptible to the AdministrativeRecords Experiment deficiencies

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 3

Page 14: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

than responding units. In addition,enumeration errors in Census 2000might have been higher for theseunits.

The regression analysis demon-strated a number of factors associ-ated with greater probability ofmatched household demographics.These include: single unit addressrather than multi-unit, householdwith only one or two members, allhousehold occupants over the ageof 65, at least one White occupant,no occupant with imputed race inthe Administrative RecordsExperiment. The predictive powerof the model was moderatelystrong. At a predicted probabilityof 0.5 or higher, the probability ofa correct household match wasabout 72 percent. At a predictedprobability of 0.8 or higher, theprobability of a correct matchincreased to about 83 percent, butthe proportion of addresses withpredicted probability this high wasonly about 4 percent of alladdresses. Evidently, the limita-tions in the data, particularly theadministrative records cutoffs andpoor race and Hispanic originmeasurement, made householdprediction quite difficult.

Implications for 2010planning

Substitution for 2010 nonresponsefollowup households should contin-ue to be explored

Although the results of the house-hold-level analysis were not defini-tive due to the limitations onAdministrative Records Experiment2000, they were sufficiently strongthat research into the substitutionof administrative records house-holds for nonresponse followup orunclassified households in a con-ventional census should continue.For nonresponse followup house-holds there is the potential for sig-nificant cost savings, and for

unclassified households, the poten-tial for greater accuracy than thatprovided by imputation.

The approach piloted in theAdministrative Records Experiment2000 should be tested as part ofthe 2004 Census Test using mod-els developed from a linkage ofStatistical Administrative RecordsSystem 2000 data to the Census2000 files. The timing of theadministrative records in theStatistical Administrative RecordsSystem 2000 would be much clos-er to Census Day than theStatistical Administrative RecordsSystem 1999 data used in theAdministrative Records Experiment2000, and much more like the datathat could be acquired for 2010.

Other 2010 impacts shouldbe considered

There are other aspects of 2010Census development in whichadministrative records might play arole. These include MasterAddress File improvements, devel-opment and testing of unduplica-tion methods for 2010, subnation-al Demographic Analysis, andcoverage measurement research.

2010 data acquisition andresearch agenda

Arrangements should be made toacquire administrative records on atimelier basis and to obtain somedata sets that might fill some ofthe administrative records cover-age gaps.

A research agenda for 2010 wouldinclude:

• Additional evaluation of theimpact of clerical and field oper-ations in Administrative RecordsExperiment 2000.

• Person unduplication in theAdministrative RecordsExperiment Bottom-up process.

• Repeating AdministrativeRecords Experiment 2000 withStatistical AdministrativeRecords System 2000 data.

• Repeating the Household-levelanalysis using StatisticalAdministrative Records System2000 data.

• Analysis of administrativerecords coverage gaps, in partic-ular gaps related to persons ingroup quarters.

• Master Address File improve-ments using administrativerecords.

• Improving address linkage tech-niques.

• Enhancing Numident race andHispanic origin data usingCensus 2000.

• Contributing to subnationalDemographic Analysis.

Implications for otherCensus Bureau programs

The research that went into thedevelopment of the StatisticalAdministrative Records System andAdministrative Records Experiment2000 has had significant payoffs inCensus programs other than thedecennial census, and the develop-ment of new uses for administra-tive records should continue tobenefit non-decennial programs inthe future. There have been hugegains in knowledge of thestrengths and weaknesses ofnational administrative recordssystems to support various CensusBureau activities, in the capacityfor large scale data processing,data standardization, record link-age, file unduplicaton, and SocialSecurity Number search and verifi-cation that will have benefitsthroughout the Census Bureau.

4 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Page 15: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 5

Research agenda for otherCensus Bureau programs

A research agenda for otherCensus Bureau programs couldinclude:

• Testing the use of StatisticalAdministrative Records Systemas a contributor to total popula-tion and age/race/sex/Hispanicorigin intercensal estimates.

• Testing the use of StatisticalAdministrative Records System

data for improving noninterviewweights in ongoing surveys.

• Testing the use of StatisticalAdministrative Records Systemas a tool to support small areaincome and poverty estimates.

• Continuing to test the use ofAdministrative Records databas-es for Social Security Numbervalidation and search strategies.

• Continuing to improve ourrecord linkage capabilities (for

example, linking CurrentPopulation Survey addressesand persons to comparableDecennial Census addresses andpersons), both in terms ofimprovements to noninterviewweights in ongoing surveys andsearch strategy improvements.

Page 16: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

This page intentionally left blank.

Page 17: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

1.1 Introduction

The Administrative RecordsExperiment 2000 (AREX 2000) wasan experiment in two areas of thecountry designed to gain informa-tion regarding the feasibility ofconducting an administrativerecords census (ARC), or the use ofadministrative records in supportof conventional decennial censusprocesses. The first experiment ofits kind, AREX 2000 was part ofthe Census 2000 Testing,Experimentation, and EvaluationProgram. The focus of this pro-gram was to measure the effective-ness of new techniques, method-ologies, and technologies fordecennial census enumeration.

Interest in taking a decennial cen-sus by administrative records datesback at least as far as a proposalby Alvey and Scheuren (1982)wherein records from the InternalRevenue Service (IRS) along withthose of several other agenciesmight form the core of an adminis-trative records census. Knott(1991) identified two basic ARCmodels: (1) the Top-down modelthat assembles administrativerecords from a number of sources,unduplicates them, assigns geo-graphic codes, and counts theresults; and (2) the Bottom-upmodel that matches administrativerecords to a master address file,fills the addresses with individuals,resolves gaps and inconsistenciesaddress by address, and countsthe results. There have been anumber of other calls for ARCresearch – see for exampleMyrskyla 1991; Myrskyla, Taeuberand Knott 1996; Czajka, Moreno

and Shirm 1997; Bye 1997. All ofthe proposals fit either the Top-down or Bottom-up modeldescribed here.

Knott also suggested a compositeTop-down/Bottom-up model thatwould unduplicate administrativerecords using the Social SecurityNumber (SSN), then match theaddress file, and proceed as in theBottom-up approach. In overallconcept, AREX 2000 most closelyresembles this compositeapproach.

More recently, direct use of admin-istrative records in support ofdecennial applications was cited inseveral proposals during theCensus 2000 debates on samplingfor Nonresponse Followup (NRFU).The proposals ranged from directsubstitution of administrative datafor non-responding households(Zanutto, 1996; Zanutto andZaslavsky, 1996; 1997; 2001) toaugmenting the Master AddressFile development process with U.S.Postal Service address lists(Edmonston and Schultze,1995:103). AREX 2000 providedthe opportunity to explore the pos-sibility of NRFU support.

The Administrative RecordsResearch (ARR) staff of thePlanning, Research, and EvaluationDivision (PRED) performed themajority of coordination, design,file handling, and certain fieldoperations of the experiment.Various other divisions within theCensus Bureau, including FieldDivision, Decennial Systems andContracts Management Office,Population Division, and

Geography Division supported theARRS staff.

Throughout this report, rather thanidentifying individual workgroupsor teams, we shall refer to theoperational decisions made in sup-port of AREX to be those of ARRS;that is, we shall say that “ARRSdecided to…” whenever a keyoperational decision is described,even though, of course, ARRS werenot the only decision makers.

1.2 Administrative RecordCensus–definition andrequirements

In the AREX, an administrativerecord census was defined as aprocess that relies primarily, butnot necessarily exclusively, onadministrative records to producethe population counts and contentof the decennial census short formwith a strong focus on apportion-ment and redistricting require-ments. Title 13, United StatesCode, directs the Census Bureau toprovide state population counts tothe President for the apportion-ment of Congressional seats withinnine months of Census Day. Inaddition to total population countsby state, the decennial censusmust provide counts of the votingage population (18 and over) byrace and Hispanic origin for smallgeographic areas, currently in theform of Census blocks, as pre-scribed by PL 94-171 (1975) andthe Voting Rights Act (1964).These data are used to constructand evaluate state and local leg-islative districts.

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 7

1. Background

Page 18: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Demographically, the AREX provid-ed date of birth, race, Hispanic ori-gin, and sex, although the latter isnot required for apportionment orredistricting purposes.Geographically, the AREX operatedat the level of basic street addressand corresponding Census blockcode. Unit numbers for multi-unitdwellings were used in certainaddress matching operations andone of the evaluations; but gener-ally, household and family compo-sition were not captured. In addi-tion, the design did not provide forthe collection of sample long formpopulation or housing data, needsthat will presumably be met in thefuture by the American CommunitySurvey program. The design didassume the existence of a MasterAddress File and geographic cod-ing capability similar to that avail-able for Census 2000.

1.3 Administrative RecordsExperiment objectives

The principal objectives of AREX2000 were twofold. The firstobjective was to develop and com-pare two methods for conductingan administrative records census,one that used only administrativerecords and a second that addedsome conventional support to theprocess in order to complete theenumeration. The evaluation ofthe results also included a compar-ison to Census 2000 results in theexperimental sites.

The second objective was to testthe potential use of administrativerecords data for some part of theNRFU universe, or for the unclassi-fied universe. Addresses that fallinto the unclassified status havevery limited information onthem–so limited, in fact, that theaddress occupancy status must beimputed, and, conditional on beingimputed “occupied,” the entirehousehold, including characteris-

tics, must be imputed. In order toeffectively use administrativerecords databases for substitutionpurposes; one must determinewhich kinds of administrativerecord households are most likelyto yield similar demographic distri-butions to their corresponding cen-sus households.

Other more general objectives ofthe AREX included the collection ofrelevant information, available onlyin 2000, to support ongoingresearch and planning for adminis-trative records use in the 2010Census, and the comparison of anadministrative records census toother potential 2010 methodolo-gies. These evaluations and otherdata will provide assistance inplanning major components offuture decennial censuses, particu-larly those that have administrativerecords as their primary source ofdata.

1.4 Administrative RecordsExperiment top-down andbottom-up methods

Top-down

A two-phase process accomplishedthe AREX 2000 enumeration. Thefirst phase involved the assemblyand computer geocoding ofrecords from a number of nationaladministrative record systems, andunduplication of individuals withinthe combined systems. This wasfollowed by two attempts to obtainand code physical addresses (cleri-cal geocoding and request forphysical address) for those thatwould not geocode by computer.Finally, there was a selection of“best” demographic characteristicsfor each individual and “best”street address within the experi-mental sites. Much of the comput-er processing for this phase wasperformed as part of the StatisticalAdministrative Records System(StARS) 1999 processing (Judson,

2000; Farber and Leggieri, 2002).As such, StARS 1999 was an inte-gral part of AREX 2000 design.

One can think about the results ofthe Top-down process in two ways.First, counting the population atthis point provides, in effect, anadministrative-records-only census.That is, the enumeration includesonly those individuals found in theadministrative records, and there isno other support for the censusoutside of activities related togeocoding. AREX 2000 providespopulation counts from the Top-down phase so that the efficacy ofan administrative-records-only cen-sus can be assessed.

However, one might expect an enu-meration that used only adminis-trative records to be substantiallyincomplete. Therefore, a secondway to think about the Top-downprocess is as a substitute for aninitial mailout in the context of amore conventional census thatwould include additional supportfor the enumeration.

Bottom-up

The fundamental differencebetween the Bottom-up methodand the Top-down method is theBottom-up method matches admin-istrative records addresses to aseparately developed “frame” ofaddresses, and based on thismatch, performs additional opera-tions. In this experiment, anextract of the Census Bureau’sMaster Address File (MAF) servedas the frame1.

The second phase of the AREX 2000design was an attempt to completethe administrative-records-only enu-meration by the correction of errorsin administrative records addresses

8 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

1 In this report, we use the term "MAF"generically. Our operations were based onextracts from the Decennial Master AddressFile (DMAF).

Page 19: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 9

through address verification (a cov-erage improvement analogue) andby adding persons missed in theadministrative records (a NRFUanalogue). This phase began bymatching the addresses found inthe Top-down process to the MAFin order to assess their validity andto identify those MAF addressesfor which no administrativerecords were found. A fieldaddress review (FAV) was used toverify non-matched administrativerecords addresses, and invalidadministrative records addresseswere excluded from the Bottom-upselection of best address. Indesign, non-matched MAF address-es would be canvassed in order toenumerate persons at addressesnot found in the administrativerecords systems. In the AREX,such a canvassing was simulatedby adding those persons found inthe Census 2000 at the unmatchedaddresses to the adjusted adminis-trative-records-only counts, thuscompleting the enumeration.Accomplishing the AREX as part ofthe Census 2000 obviated theneed to mount a separate fieldoperation to canvass unmatchedMAF addresses.

Considering the Top-down andBottom-up processes as part of oneoverall design, AREX can bethought of as a prototype for amore or less conventional censuswith the initial mailout replaced bya Top-down administrative recordsenumeration. Figure 1 below, pro-vides a conceptual overview of theexperiment for enumerating thepopulation tested during the AREX.A more detailed description of dataprocessing flows can be found inAttachment 1. The graphicaldescription presented here isintended to convey the concept ofboth AREX methods when viewedin terms of the Bottom-up method

as a follow-on process to the Top-down method.

1.5 Experimental sites

The experiment was set up toinclude geographic areas thatinclude both difficult and easy toenumerate populations. Two siteswere selected believed to haveapproximately one million housingunits and a population of approxi-mately two million persons. Onesite included Baltimore City andBaltimore County, Maryland. Theother site included Douglas, ElPaso, and Jefferson Counties,Colorado. The sites provided a

mix of characteristics needed to

assess the difficulties that might

arise in conducting an administra-

tive records census.

Approximately one half of the test

housing units was selected based

on criteria assumed to be easy-to-

capture in an administrative

records census (for example, areas

having a preponderance of city

style addresses, single family

housing units, older and less

mobile populations), and the other

half was selected based on criteria

assumed to be hard to capture (the

converse).

Figure 1. Summary Diagram of AREX 2000 Design

Clerical Geocoding& Request for

Physical Address

Address Operations

GeocodedAddresses

atBlock level

Addressesand

Personsin Blocks

Select Top-DownBest Addresses

for Persons

MatchAR addresses

to MAF

MatchedAR/MAF

Addresses

UnmatchedAR

Addresses

UnmatchedMAF

Addresses

Top-DownTallies

(“AR - only"Census)

Bottom-UpTallies

(“AR-Censuswith

Enhancements”)

Field AddressVerification

CoverageImprovement

Analogue

Census Pull

NRFUAnalogue

Select Bottom-UpBest Addressfor Persons

EnumerationAnalogue

Test SiteAddresses

Test SitePersons*

BOTTOM UP

TOP DOWN

Natio

nal

StAR

S 99

Addressesand

Personsin Blocks

Test SitePersons*

Page 20: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

10 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

1.6 Administrative RecordsExperiment source files

The administrative records forAREX were drawn from the StARS1999 database. There were sixnational-level source files selectedfor inclusion in StARS. A later sec-tion of this document describesthe source files in detail. The fileswere chosen to provide the broad-est coverage possible of the U.S.population, and to compensate forthe weaknesses or lack of cover-age of a given segment of the pop-ulation inherent in any one-sourcefile. See Section 2 for a descrip-tion of the source file characteris-tics.

Timing

An important limitation for theAREX is the gap between the refer-ence period for data contained ineach source file and the point-in-time reference of April 1, 2000 forthe Census. The time lag has animpact on both population cover-age–births, deaths, immigrationand emigration–and geographiclocation–housing extant, and geo-graphic mobility. As an example,both IRS files include data for taxyear 1998 with an expected cur-rent address as of tax filing timeclose to April 15, 1999. Note,however, that the IRS 1040 fileonly provided persons in the taxunit as of December 31, 1998.The pertinent reference dates foreach of the files are provided inSection 2.

State, Local and Commercial Files

ARRS decided not to use state andlocal files2 and commercially avail-able databases3 in the AREX 2000experiment. Statistical evidence is

limited, but various reports fromARRS indicated that state and localfiles come in an extremely diversevariety of forms, with equallydiverse record layouts and content(for historical information, seeSweet, 1997; Buser, Huang, Kim,and Marquis, 1998; and otherpapers in the AdministrativeRecords Memorandum Series).Furthermore, ARRS reported that itwas quite time-consuming andintricate to develop the intera-gency contractual arrangementsnecessary to use state and localfiles. Public opinion results suchas Singer and Miller (1992), AguirreInternational (1995), and Gellman(1997), convinced ARRS that publicsensitivity to the idea of linkingcommercial databases with govern-ment databases (other than foraddress processing) would be toogreat, and that such a linkagewould be unwise.

Census Numident

An additional, and critical, file usedin creation of the StARS databasewas the Census Numident file. Forthe AREX, it was the source ofmost of the demographic charac-teristics and some of the deathdata. Detailed discussion regard-ing the creation and use of theCensus Numident may also befound in Section 2.

1.7 Administrative RecordsExperiment evaluations

This report is a consolidation offour evaluations of AREX 2000 thathave been prepared by ARRS staff.

The Process Evaluation (Berningand Cook, 2002) documents andanalyzes selected components orprocesses of the Top-down andBottom-up methods in order toidentify errors or deficiencies. It isdesigned to catalogue the variousprocesses by which raw adminis-trative data became final AREX

counts and attempts to identify therelative contributions of these vari-ous processes.

The Request for PhysicalAddress (RFPA) Evaluation(Berning, 2002) assesses theimpact of noncity-style addresses.These addresses present a signifi-cant hurdle to the use of an admin-istrative records census on either asupplemental or substitution basis.A particular problem is the deter-mination of residential addressesand their associated geographicblock level allocation for individu-als whose administrative recordaddress is a P.O. Box or RuralRoute.

The Outcomes Evaluation(Heimovitz, 2002) is a comparisonof Top-down and Bottom-up AREXcounts by county, tract, and blocklevel counts of the total populationby race, Hispanic origin, agegroups and gender, with compara-ble decennial census counts. Thisevaluation is outcome rather thanprocess oriented.

The Household Evaluation(Judson and Bauder, 2002) focuseson household-level comparisonsbetween administrative recordsand Census 2000. It assesses thepotential for NRFU substitution andunclassified imputations, and pre-dictive capability.

1.8 Limitations of theexperiment

In order to achieve a full under-standing of the AREX processesand outcomes, it is important toappreciate the context withinwhich the experiment was carriedout. The AREX was the firstattempt by the Census Bureau toexperiment with the use of admin-istrative records as the foundationof a short form decennial census.Planning for the experiment didnot begin until the end of 1997,

2 Cut-off date is same as dates used todefine universe: persons born after April 2,1972 and on (or before) April 1, 1980.

3 Universe also defined as persons witha death date of 12/31/1989 or later.

Page 21: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 11

which was quite late in the Census2000 development cycle for anexperiment of such complexity.The resources for the experimentwere limited to a part of theAdministrative Records ResearchStaff (ARRS) in the Planning,Research, and Evaluation Division(PRED) with the help of otherdecennial census staff.

Administrative records source files

A consequence of the short plan-ning time and limited resourceswas a number of design and opera-tional decisions that made theAREX 2000 enumeration processquite different from the way suchan enumeration might be carriedout if administrative records wereto be used in some future decenni-al census. Chief among these dif-ferences was the decision to useStARS 1999, the national adminis-trative records database developedby ARRS, as the source of theadministrative records for theexperiment. The administrativerecords source files for StARS 1999neither exhausted the national-level administrative records thatmight have been available for AREX2000 nor were they the timeliest.To cover the population, StARS1999 relied primarily on taxrecords for 1998 received by theInternal Revenue Service (IRS) incalendar year 1999. While IRS taxrecords would have to be the coreof any national administrativerecords database, the coveragedeficiencies are well known–adultswithout tax documents, children oftaxpayers with more than fourdependents, and children of adultswho did not have to file 1040income tax returns. With addition-al time, more could have beendone to obtain administrativerecords from Social SecurityAdministration (SSA) and theCenters for Medicare and MedicaidServices (CMS) that might have

filled these coverage gaps.However, the acquisition of datafrom Federal agencies is a difficult,time consuming, and sometimesexpensive process involving nego-tiations, interagency agreements,data extract specifications, andtesting and validation of the deliv-ered products before such data canbe included in a national database.

Obtaining timelier data for theAREX 2000 would have required, insome cases, the receipt of data ona flow basis from the source agen-cies. Receipt of tax forms filed incalendar year 2000, would haverequired obtaining 1040 data fromIRS on a flow basis and possibly1099 and W-2 data from SSA aswell. (See Section 2 for a discus-sion of the sources and timing oftax data.) Also, more timelyextracts might have been obtainedfrom other contributing agencieshad there been sufficient time tomake the arrangements. As will beevident in this report, the fact thatthe reference period for the admin-istrative data was one or moreyears behind census day (April 1,2000) was the single most impor-tant limitation to the AREX goal oftesting the completeness and accu-racy of an administrative recordscensus.

Using timely administrative recordsdata in a decennial census implieslarge administrative records dataprocessing operations would bedone quickly as part of the decen-nial enumeration. One thinglearned from AREX 2000 is thatsuch processing is technically fea-sible and could be accomplishedwith the planning time andresources that would be availablefor actual census operations asopposed to those typically avail-able for small experiments.

Two experimental sites

A second major limitation imposedby lack of planning time andresources was the restriction of theexperiment to five counties in twostates. Although it would not havebeen reasonable or realistic toattempt to mount this first AREX ina representative sample of geo-graphic areas large enough tomake national estimates, additionalsites would have provided moreconfidence that the results couldbe generalized beyond the sitesselected. While there is much tobe learned from the AREX, it isimportant to keep in mind that forthe AREX results, descriptive statis-tics are generally only representa-tive of the test sites themselves;and the modeling results, thoughsuggestive of the relationshipsbetween administrative recordsoutcomes and their covariates, arenot definitive.

Lack of experimental variation ofkey design parameters

There were several AREX opera-tions relating to address process-ing that could have been morethoroughly evaluated with someadditional structure in the experi-mental design. These operationsinvolved clerical and field attemptsto validate addresses and obtainblock-level geocodes, clerical sup-port for addressing matching ofadministrative records to theMaster Address File, and the “bestaddress” selection algorithm forthe administrative records. In allcases, the objective of the evalua-tion would have been to assess theimpact of the particular operationor algorithm on final address selec-tion and ultimately whether theoperation contributed significantlyto the accuracy of the AREX enu-meration.

Evaluation of the clerical and fieldoperations, individually or in

Page 22: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

12 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

combination, would be importantbecause they represent potentialcostly components were they to beimplemented as part of a nationaladministrative records census.Evaluation of the address selectionalgorithm would have revealed theimpact of the preference ofgeocoded addresses over others inthe algorithm. Unfortunately, theexperimental design did notinclude factorial or fractional facto-rial structure permitting direct esti-mates of the impact of operationalcomponents, individually or incombination.

Race and Hispanic origin models

Population tallies by race andHispanic origin are a crucial prod-uct of the short form censusbecause of their use in drawingand evaluating political districts atand below the state level.Measurement of race and Hispanicorigin is a major weakness ofadministrative records at the

national level and any attempt touse administrative records to enu-merate all or part of the populationwould have to find some way ofimproving the information avail-able in administrative records.

In his design proposal for anadministrative record census in2010, Bye (1997) suggested build-ing a list of SSNs annotated byrace and Hispanic origin by aseries of operations that wouldbegin by matching Census 2000 toSSA’s Numident and continue dur-ing the years leading up to the2010 Census. (See also Bye andThompson (1999). Sections 4 and5 of this report describe activitiescurrently underway at the CensusBureau.) Had more planning timeand resources been available to theAREX, it might have been possibleto incorporate race roster buildinginto the experiment by includingone or more of the 1995 and 1996census test sites or Census 2000

Dress Rehearsal sites in the AREX(Bye, 1997).

However, such race roster buildingwas not available to the AREX, andARRS decided to use Numident-based national-level models to aug-ment the race and Hispanic origindata (Bye 1998). Although usingthe models generally worked inaggregate counts, the use ofnational-level models to imputecharacteristics of small geographicareas has certain well-known weak-nesses in that the actual findingsin the smaller areas can vary sub-stantially around the national pre-dictions. Bye (1998) provided tab-ulations for states and somesubstate areas showing the kind ofvariation that could be expectedwhen using the national modelsfor the AREX. Bye and Thompson(1999) provided a partial solutionto this problem, but an annotatedNumident file is clearly a superiorsolution.

Page 23: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 13

2.1 Introduction

This section describes and evalu-ates the AREX enumerationprocesses. The process descrip-tion is taken largely from theprocess evaluation report ofBerning and Cook (2002). In thisreport, process descriptions havebeen provided separately for theTop-down and Bottom-up enumera-tions. The actual data processingflows were often intermingled andare provided by Berning and Cookin great detail. Concerningprocess evaluation, Berning andCook focused mainly on data pro-cessing and clerical operations.

Administrative records

AREX source files

The administrative records forAREX were drawn from the StARS1999 database. The six national-level source files selected for StARSwere chosen to provide the broad-est coverage possible of the U.S.population. At a minimum, thefiles had to have for each record, aname, Social Security Number(SSN), and street address.

The national level files that con-tributed to the StARS 1999 data-base and therefore to AREX 2000,were:

Internal Revenue Service (IRS)Tax Year 1998 Individual MasterFile (IMF 1040),

IRS Tax Year 1998 InformationReturns Master File (IRMF W-2/1099),

Department of Housing andUrban Development (HUD) 1999Tenant Rental AssistanceCertification System (TRACS)File,

Center for Medicare andMedicaid Services (CMS) 1999Medicare Enrollment Database(MEDB) File,

Indian Health Services (IHS)1999 Patient RegistrationSystem File, and

Selective Service System (SSS)1999 Registration File.

The following table displays theprimary reason each file wasincluded in the StARS database and

the approximate number of inputrecords associated with each.

Timing

An important limitation for theAREX was the gap between the ref-erence period for data contained ineach source file and the point-in-time reference of April 1, 2000 forthe Census. The gap had animpact on both population cover-age (births, deaths, immigrationand emigration) and geographiclocation (housing extant, and geo-graphic mobility). As an example,the IRS 1040 file included data fortax year 1998 with an expectedcurrent address as of tax filingtime close to April 15, 1999, butprovided only persons in the taxunit as of December 31, 1998.

The following table displays thereference periods of the files avail-able. Generally, the reference peri-ods are about one year prior to theday of Census 2000.

Census Numident

An additional, and critical, file usedin creation of the StARS database

2. Administrative Records ExperimentProcess Evaluation

Table 1.Source File Characteristics

File Targeted population segmentNumber of

address records(millions)

Number ofperson records

(millions)

IRS 1040 Taxpayer and other members of the reporting unit with current address 120 243IRS W2/1099 Persons with taxable income who might not have filed tax returns 598 556HUD TRACS Low income housing population (possible non-taxpayers) 3.3 3.3Medicare File Elderly population (possible non-taxpayers) 57 57IHS File Native American population (possible non-taxpayers) 3.1 3.1SSS File Young male population (possible non-taxpayers) 14.4 13.1

Total 795 million 875 million

Notes: The variance between the number of address records and person records within the input source files is a result of the following source file charac-teristics: (1) Each IRS 1040 input record may reflect up to six persons (primary filer, secondary, and dependents). (2) Each SSS input record may reflect twoaddresses - defined as current and/or permanent address.(3) The IRS W-2/1099 file undergoes a preliminary unduplication and clean-up process prior to theinitial file edit process.

Page 24: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

14 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

was the Census Numident file. For

the AREX, it was the source of

most of the demographic charac-

teristics and some of the death

data.

The Census Numident was created

by ARRS for the primary purpose

of validating Social Security

Numbers (SSNs) used in the pro-

cessing of administrative records

and supplying demographic vari-

ables missing from source files.

The Census Numident is an edited

version of the Social Security

Administration’s (SSA) Numerical

Identification (Numident) File. The

SSA Numident file is the numerical-

ly ordered master file of assigned

Social Security Numbers (SSN) that

may contain up to 300 entries for

each SSN record, although on aver-

age contains two records per SSN.

Each entry represents an initial

application for a SSN or an addition

or change (referred to as a transac-

tion) to the information pertaining

to a given SSN. The SSA Numident

contains all transactions (and

therefore, multiple entries) ever

recorded against a single SSN. The

SSA Numident available for StARS

1999 reflected all transactions

through December 1998.

The Census Numident wasdesigned to collapse the SSANumident entries to reflect “onebest record” for each SSN contain-ing the “best” demographic datafor each SSN on the file. Followingedit, unduplication, and selectionof best demographics, the SSANumident file of nearly 677 millionrecords was reduced to just over396 million records that comprisethe Census Numident file.

2.2 Top-down enumeration

Dual stream process

The goal of the Top-down processwas to use administrative recordsto identify individuals residing atgeocoded addresses in the AREXtest sites and to construct a datarecord for each individual that con-tained demographic data (age, gen-der, race and Hispanic origin) cor-responding as closely as possibleto census short form data. Toachieve this goal, a “dual-stream”processing approach was adopted.One processing stream concernedthe development of a uniquerecord for each individual withbest demographics. The secondstream involved development of anunduplicated set of addresses,geocoded to the block level. In the

end, persons and addresses werebrought together, and a bestaddress was selected for each per-son to complete the Top-down enu-meration.

The following sections provide abrief description of the AREX Top-down data processing steps. Muchof the work was accomplished inthe development of StARS 1999itself, but there were some differ-ences in demographic and addressselection rules. More detail isgiven in Berning and Cook (2002).

Top-down Person processing con-sisted of three main steps.

1. File edits for person data,

2. SSN Verification of personrecords,

3. Unduplication of person records,and creation of the PersonCharacteristics File (PCF) thatcontained the “best” demograph-ic characteristics for each per-son record.

Models were used to generate“best” demographic characteristics.Details about the models can befound in Bye (1998, race/Hispanicorigin)4, and Thompson (1999,gender)5. In general, a person’smodeled race or gender was usedonly in the case where no raceappeared on any administrative

Table 2.Reference Dates of Source Files

Source file Cut-offdate

Requestedcut date Universe

Indian Health Service . . . . 04/01/99 04/01/99 All persons alive at cut-off date

Selective Service . . . . . . . . Note 2 04/01/99 Males between the age of 18-25

HUD TRACS . . . . . . . . . . . 04/01/99 904/01/99 All persons on file as of cut-off date

Medicare . . . . . . . . . . . . . . . Note 3 04/01/99 All persons alive at cut-off date

IRS 1040. . . . . . . . . . . . . . . 12/98 Note 1 Individual tax returns for tax year 1998

IRS W-2 /1099 . . . . . . . . . . 12/98 04/01/99 Forms W-2 and all 1099 forms taxyear 1998

Note 1: File Cut date is for posting cycle weeks 1-39 only for IRS 1040, and weeks 1-41 for IRS1099 files. Weeks 40-52 (and 42-52 respectively) were not included in StARS ’99. This file reflects themost current address on file for the taxpayer. It could be an address that has been updated since the1998 tax return was posted.

Note 2: Cut-off date is same as dates used to define universe: persons born after April 2, 1972 andon (or before) April 1, 1980.

Note 3: Universe also defined as persons with a death date of 12/31/1989 or later.

4 The Race and Hispanic Origin modelswere developed using Numident data andSpanish and Asian name lists. The principalvariables in the prediction equations were:(1) race or Hispanic origin as it appeared inthe Numident, (2) place of birth, (3) Spanishand Asian surname indicators for the SSNholder and parents' surnames, and (4) indi-cator field in the Indian Health Service file.The Race and Hispanic Origin models wereoriginally developed to augment race andHispanic origin information in the Numident.

5 The gender model was based on thestrength of association between first andmiddle names and reported sex. Look-uptables created for common names, uncom-mon names, name-gender proportions, andgender model parameters were created anda final gender probability assigned after thefour look-up tables were created and runagainst each input record.

Page 25: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 15

record, including the Numident. Inthe case of gender, the model wasrarely used since the Numidentreported gender more than 99 per-cent of the time. For Race, themodel was used when theNumident race was shown as“Other” or “Unknown” or“Hispanic,” and no other adminis-trative record provided it. Thevast majority of cases withunknown race were either childrenwhose applications for SSNs wereprocessed via SSA’s enumeration-at-birth program, which was startedin the mid-1980s, or older personswho had applied for Social Securitybenefits prior to SSA’s developmentof the electronic Numident in themid-1970s.

For Hispanic Origin, the model wasused for all cases for which neitherthe Numident nor any of the otheradministrative records indicatedHispanic origin. Because theNumident did not capture Hispanicorigin prior to 1980, the modelwas used for well over 90 percentof the cases. The following tableshows the extent of Race and

Hispanic Origin imputation for theindividuals included in the Top-down AREX enumeration.

Top-down Address Processing

Top-down Address Processing con-sisted of four main steps.

1. File edits for address data,

2. Code-1 processing and comput-er geocoding the addressrecords,

3. Manual geocoding for addressesnot coded by computer,

4. Creation of Master Housing Filefor administrative recordaddresses.

The creation of the Master HousingFile for administrative recordaddresses was the final step in theaddress processing before theaddresses were relinked with theperson records. This step had twomain objectives. First there was anattempt to identify commercialaddresses in the files. Second,there was a final attempt to undu-plicate the addresses prior to the

application of address selectionrules.

Clerical Geocoding and Request forPhysical Address (RFPA)

Addresses that cannot be geocod-ed by computer generally fall intothree categories: (1) city styleaddresses; (2) P.O. Box and non-city style addresses (ruralroute/box number); or (3) address-es that are so fragmented that theycannot be classified. Proceduresfor attempting to obtain geocodesfor the first two classes of address-es are described below. Seriouslyfragmented addresses are discard-ed at this point.

Master Address File GeocodingOffice Resolution (MAFGOR)

MAFGOR was an existing opera-tional capability within theRegional Census Centers (RCC) toprovide clerical geocoding for theDecennial Master Address File aspart of Census 2000. Addressesidentified by ZIP code as beingpotentially in the AREX sites butnot geocoded by computer weresent to the Philadelphia RCC(79,307) and the Denver RCC(83,841). These two RCCsattempted to clerically geocodethese addresses using trainedstaff, reference materials, andmaps. The clerical geocodingadded about 3 percent to the totalnumber of addresses coded.

Request for Physical Address(RFPA)

P.O. Box and rural route/box num-ber addresses pose a special chal-lenge for geocoding. The P.O. Boxaddress does not refer to a physi-cal location and the non-city styleaddresses often do not preciselyidentify the housing unit location.The RFPA was an attempt to collectphysical addresses (house numberand street name) for personsreceiving mail at these potential

Table 3.Percent of Cases With Imputed Race or Hispanic Origin byAge and County

CountyImputed race Imputed Hispanic origin

<18 18 and over <18 18 and over

Baltimore City. . . . . . . . . . . 40.7 2.8 99.4 96.7Baltimore County . . . . . . . . 49.3 4.0 99.2 98.5Douglas, CO. . . . . . . . . . . . 58.2 6.0 98.3 97.7El Paso, CO . . . . . . . . . . . . 52.2 9.0 93.9 93.9Jefferson, CO . . . . . . . . . . . 54.8 8.0 94.7 95.2

Table 4.StARS 1999 and AREX Test Site Geocoding Tallies

ItemNumber input

records togeocoding

Number ofrecords

geocodedPercent

geocoded

StARS National Address File . . . . . . . . . . . . 147,346,145 108,032,169 73.3Maryland subset of StARS National File . 725,108 626,247 86.4Colorado subset of StARS National File. . 624,248 498,783 79.9

Page 26: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

16 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

test site addresses. Major compo-nents of the operation were to:

• Create an address file fromadministrative records wherethe mailing address was a PostOffice Box or noncity-styleaddress.

• Design and mail a form request-ing physical address informa-tion.

• Have the RCCs attempt to cleri-cally geocode the physicaladdresses of the returned formsto state, county and block.

• Key addresses and geocodeinformation to a file for furtheranalysis.

The mailing was sent to 58,151addresses associated with 138,653individuals. For a number of rea-sons, the response rate to themailout was only about 20 percentof which about 86 percent (9,431physical addresses) were geocod-ed, 8,090 to an AREX test sitecounty. The coded addresses wereto have been added to the addresslists prior to AREX address selec-tion. However, because of thesmall number of persons thatwould have been potentially addedto the enumeration or for whomaddresses might have changed,these addresses were not incorpo-rated into the AREX address file.As indicated above, the RFPA was

the subject of a special evaluation.More can be found in Berning(2002).

Table 5 provides a summary ofTop-down address coding. Notethat only about 3,000 addresseswere too fragmented to be eligiblefor either MAFGOR or RFPA.

AREX Master Housing File

The AREX Master Housing File(MHF) contained an unduplicatedset of non-commercial addressrecords that was linked with theperson records prior to the applica-tion of the best address selectionalgorithm.

AREX Top-down composite personrecords (CPR)

At this point in the AREX “dualstream” process, address and per-son data were brought together inpreparation for creation of theComposite Person record. Therewere two principal tasks. First,individuals potentially in the AREXtest sites were identified. Then,the best address was selected forthese persons. If the best addresswas in the test site, then the indi-vidual became part of the Top-down enumeration.

The development of the AREX per-son universe began with thenational databases of persons andaddresses described in the previ-ous sections. First, all persons

ever associated with an AREXaddress were included in a file ofpotential AREX persons. Next, allof the addresses associated withthese persons–addresses both inand outside of the test site–wereassembled and subjected to thefollowing selection algorithm.

• Select geocoded addresses overnon-geocoded addresses.

• Select the highest HUID categoryavailable.

• Select a non-proxy address overan address with a proxy.

• Select a non-commercial addressover a commercial address.

• Select the address based onsource file priority as follows:

IRS 1040 recordMedicare recordIndian Health Service

recordIRS 1099 recordSelective Service recordHUD TRACs recordSelect the most recent

record based on theadministrative recordcycle dates.

Select the first record read-in to the processing array for output to theCPR.

If the best address for any per-son record from among theAREX person universe file was

Table 5.AREX Administrative Record Address Geocoding Results

State Addressesin test sites

TIGERcoded

Not TIGERcoded

Eligible forMAFGOR

Coded byMAFGOR

Maryland . . . . . . . . . . . . . . . 725,108 626,247 98,861 79,307 21,542Colorado . . . . . . . . . . . . . . . 624,248 498,783 125,465 83,841 28,030

Total. . . . . . . . . . . . . . . 1,349,356 1,125,030 224,326 163,148 49,572

Not eligiblefor MAFGOR

Eligible forRFPA

Returned withuseable information

Coded toAREX site

Not eligible forMAFGOR/RFPA

Maryland . . . . . . . . . . . . . . . 19,544 18,694 3,538 1,939 860Colorado . . . . . . . . . . . . . . . 41,624 39,457 8,145 6,151 2,167

Total. . . . . . . . . . . . . . . 61,168 58,151 11,683 8,090 3,027

Page 27: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 17

determined not to be within the

AREX test site, the person record

was flagged “out of scope” to

ensure the person was not

counted in the population tallies

for the AREX test site.

Top-down process results

The composite person record rep-

resents the completion of the Top-

down process for the AREX 2000

experiment. Prior to tabulation, a

final match of the AREX addresses

was made to the Decennial Master

Address File (DMAF) for the pur-

pose of transforming the collection

geography to tabulation

geography6. Because the AREX

addresses were initially geocoded

to collection geography, it was nec-

essary to translate the collection

geographic codes into the tabula-

tion geographic codes so that the

comparisons to Census 2000 tabu-

lations could be made.

The tallies for the top downmethod are shown in Table 6.

The counts by age showed theexpected results. Generally,administrative records undercount-ed the population; but coverage ofadults (89 percent - 96 percent)was much better than children (72percent - 83 percent). There is anevaluation of the administrativerecords data sources and Top-downprocessing tasks in Bye 2002.

2.3 Bottom-up enumeration

The weaknesses of the Top-downprocess as exhibited above werenot unexpected. In fact, most his-torical proposals for an administra-tive records census recognizedthat additional operations, beyondtallies of administrative records,would have to be performed for acomplete enumeration to beobtained.

The Bottom-up phase of the AREX2000 design was an attempt tocomplete the administrative-records-only enumeration byadding persons missed in theadministrative records, a processanalogous to a conventional nonre-sponse followup (NRFU). Therewas also an attempt to correct Top-

down enumeration errors by

removal of invalid administrative

records addresses prior to best

address selection. A valid address

was defined as one that matched

the DMAF or was deemed valid

after a field address review. There

was no provision for correcting

enumerations at households with

valid administrative records

addresses. Non-matched DMAF

addresses were canvassed in order

to enumerate persons at addresses

not found among the validated

administrative records addresses.

In the AREX, the canvassing was

simulated by adding those persons

found in Census 2000 at the

unmatched addresses to the

adjusted administrative-records-

only counts, thus completing the

enumeration. This phase of the

AREX was designated as Bottom-up

because it started with a known

list of residential addresses (in this

case the DMAF), matched the

administrative records addresses

to such a list, and reconciled any

non-matched cases.

The Bottom-up operational compo-

nents of AREX were conducted on

records contained within the five

test site counties. These opera-

tions consisted of:

• Computer matching AREX

addresses to the DMAF.

• Clerical review of unmatched

administrative record

addresses.

• Field Address Verification of

unmatched administrative

record addresses.

• Address re-selection.

• Census Pull, the simulated

NRFU.

• Bottom-up enumeration.

Table 6.Top-down Population Tallies

Test site county AREXpopulation

Census2000

population

Percent ofCensus

population

Baltimore City Maryland . . . . . . . . . . . . . . . . . . . 570,648 651,154 88Under 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134,471 161,353 8318 and over. . . . . . . . . . . . . . . . . . . . . . . . . . . . 436,127 489,801 89

Baltimore County Maryland . . . . . . . . . . . . . . . . 696,183 754,292 92Under 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146,012 178,363 8218 and over. . . . . . . . . . . . . . . . . . . . . . . . . . . . 550,086 575,929 96

Douglas County Colorado . . . . . . . . . . . . . . . . . 148,270 175,766 84Under 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40,085 55,477 7218 and over. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108,165 120,289 90

El Paso County Colorado . . . . . . . . . . . . . . . . . . 456,891 516,929 88Under 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110,504 142,480 7818 and over. . . . . . . . . . . . . . . . . . . . . . . . . . . . 346,322 374,449 92

Jefferson County Colorado . . . . . . . . . . . . . . . . 473,495 527,056 90Under 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101,535 133,486 7618 and over. . . . . . . . . . . . . . . . . . . . . . . . . . . . 371,894 393,570 94

6 The taking of the census spansapproximately a two year period, includingthe address list building phase. The geo-graphic framework going into the census iscalled collection geography. Prior to tabula-tion of the final Census counts, changesmust be incorporated to reflect boundariesin effect on January 1, 1999. This final geo-graphic framework is called "tabulation"geography.

Page 28: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

18 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Matching AREX records to theDMAF

The DMAF Computer Match

The objective of the computermatch operation was to determinethe extent and nature of agree-ment between addresses fromadministrative records source filesand eligible addresses from theCensus Bureau’s Decennial MasterAddress File (DMAF). To most accu-rately match the addresses, theAREX addresses were limited tothose, which were geocoded, orwith a standardized street name, astandardized property descriptionor both. Excluded from the match-ing process were non-standardizedaddresses, standardized post officeor box addresses, standardizedpost office and rural route address-es and undefined addresses. Table7 shows the administrative recordsaddresses and the DMAF addresseseligible for the computer match.

The matching process usedAutoMatch, a commercial softwarepackage that applies probabilisticrecord linkage techniques. Thefinal results were divided intomatches; possible matches; non-matches and matches to duplicateDMAF addresses. Table 8 showsthe results of the computer matchfor the administrative recordsaddresses.

Some administrative recordsmatched to more than one addressin the DMAF, each of which mighthave had subtle differences. Whenthis occurred, addresses wereflagged as having duplicate match-es. The duplicates were resolvedlater in the AREX operation wherethe best address was determinedbased on pre-defined criteria.

Clerical Review of UnmatchedAdministrative Records Addresses

Following the computer match, the

staff at the National Processing

Center conducted a clerical review.

The results of the clerical review

are shown in Table 9.

Field address verification (FAV)

The Field Address Verification

operation was implemented to

check the validity of addresses that

remained unmatched to the DMAF

following the computer matching

and clerical review. The purposes

of the FAV were to:

• Verify the physical existence or

nonexistence of non-matched

AREX 2000 Test Site addresses.

• Correct erroneous address field

values.

• Identify addresses meeting

unique conditions such as being

a duplicate of another address.

The original plan called for a

review of 100 percent of the

unmatched addresses by census

field staff, but the plan was

changed to have only a sample of

addresses reviewed by Census

Bureau Headquarters volunteers.

The results from the sample were

used to estimate a regression

equation giving the probability of a

valid address. The equation was

then used to impute validity or

Table 7.Addresses Eligible for the Match to the DMAF

Test Site

Addresses from AdministrativeRecords TIGER/MAFGOR Unduplicated

DMAFAddressesTotal Coded Non-coded

Maryland . . . . . . . . . 656,073 647,789 8,284 650,109Colorado . . . . . . . . . 531,382 526,813 4,569 526,018Total . . . . . . . . . . . . . 1,187,455 1,174,602 12,853 1,176,127

Table 8.Computer Match Results—Administrative Records Counts

Test Site

NumberRecords toComputer

Match

Number ofAddresses

Matched

Percent ofAddresses

Matched

PossibleMatchedRecords

Non-MatchedRecords

DuplicateMatches

Maryland . . . . . . 656,073 525,234 80 2,134 128,286 419Colorado . . . . . . 531,382 432,140 81 9,430 88,586 555Total . . . . . . . . . . 1,187,455 957,374 81 11,564 216,872 974

Table 9.Clerical Review Match Results

Test Site Recordssent to

ComputerMatch

MatchedRecords

beforeClericalReview

MatchedRecords

afterClericalReview

Percent ofMatchedRecordsmatched

by ClericalReview

Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . 656,073 525,234 543,811 3Colorado . . . . . . . . . . . . . . . . . . . . . . . . . . 531,382 432,140 459,753 5Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,187,455 957,374 1,003,564 4

Page 29: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 19

lack thereof to the non-sampleaddresses.

Sample design

After the computer phase ofaddress matching, the universe ofaddresses eligible for Field AddressVerification was first restricted togeocoded, city-style addresseswithin the AREX 2000 test sitecounties. The universe was furtherrestricted to exclude some AREX2000 test site ZIP codes thatbelonged to three colleges, a med-ical center, and an Air Force basein the belief that few or no residen-tial addresses existed in thoseareas.

With the redesign of the FAV opera-tion, the addresses to be verifiedwere based on a stratified cluster(Census block) sample ofunmatched, city style addresses.The sample consisted of 112blocks per AREX county and result-ed in 6,644 addresses beingflagged as part of the FAV sample(table 10).

After the fieldwork was completedand the results keyed, PRED staffthen reviewed each of the listingpages and annotated a 5-digit sta-tus code on the page. The code

categorized the type of activityabout the address that was shownon the listing page and the validityof the address. In some instances,addresses were determined to bevalid as listed (without changes).In other cases, corrections weremade to the address to make theaddress valid. Table 11 providesthe FAV sample results.

Of particular interest in this tableare the percentages of addressesdetermined to be valid as listed.Because these addresses did notmatch the DMAF even after clericalreview, it is possible that the DMAFwas incomplete. However, thismay also reflect residual difficultiesin the matching process.

Imputing validity to non-sampleaddresses

The FAV sample cases were used toestimate a logistic regressionmodel, logit(P(y=1|x))= xβ. In thisequation, the outcome measure y =1 if the address was valid, y = 0otherwise. The predictor variables,x, represented (1) characteristics ofthe administrative record address-es as possibly modified by the FAVreview, (2) DMAF block size of theDMAF address to which the admin-

istrative record partially match,and (3) the nature of the partialmatch. Generally, administrativerecords addresses were foundmore likely to be valid if they werenot commercial, were found inmultiple administrative recordsource files, had no unit identifier,and matched a DMAF address oraddresses (by state, county, zip,street name, and street name suf-fix) for which there were no unitidentifiers (i.e., it or they appearedto be a single-family dwelling ordwellings), and were located inblocks in which the DMAF indicat-ed a fairly large number ofaddresses. The overall probabilityof misclassification–the probabilitythat an address was not validtimes the probability of a falsepositive plus the probability thatan address was valid times theprobability of false negative– wasestimated to be 0.32. (A detaileddiscussion of the model andregression results can be found inBye, 2002.)

Validity or lack thereof was imput-ed for all FAV eligible addressesthat were not part of the sampleby using the regression equationto calculate the probability that theaddress was valid, and comparingthis value to a random numberdrawn from a uniform distributionbetween 0 and 1. If the randomnumber was less than or equal tothe predicted probability, theaddress was deemed to be validfor AREX Bottom-up address selec-tion purposes.

The net results of the Bottom-upadministrative records addressesprocessing – match of AREXaddresses to the DMAF and thesubsequent FAV – are given in thefollowing table.

As a result of the FAV operations,93,382 (153,535 - 59,971) of theFAV-eligible administrative records

Table 10.Selection of FAV Addresses

Test site state Number of FAVeligible addresses

Number of addressesselected for FAV sample

Maryland . . . . . . . . . . . . . . . . . . . . . 96,202 2,914Colorado . . . . . . . . . . . . . . . . . . . . . 57,333 3,730Total . . . . . . . . . . . . . . . . . . . . . . . . 153,535 6,644

Table 11.FAV Sample Results

Test siteNumber ofaddresses

sampledPercent

valid

Percentvalid as

listed

Percent validafter lister

corrections

Maryland . . . . . . . . . . . . . . . . . . . . . 2,914 38 13 25Colorado . . . . . . . . . . . . . . . . . . . . . 3,730 41 7 34Total . . . . . . . . . . . . . . . . . . . . . . . . 6,644 40 10 30

Page 30: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

20 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

addresses were found to beinvalid, and were not eligible forthe Bottom-up address selection.Note, however, that unmatchedaddresses not eligible for the FAVremained in the Bottom-up addresspool as a possible Bottom-upaddress.

Second DMAF match

A second match of the AREXaddresses was made to the DMAFfor the purpose of transformingthe collection geography to tabula-tion geography. Because the AREXaddresses were initially geocodedto collection geography, it was nec-essary to translate the collectiongeographic codes into the tabula-tion geographic codes so that thecomparisons to Census 2000 tabu-lations could be made. In general,the difference between collectionblocks and tabulation blocks wasthat some collection blocks weresplit in final decennial census tal-lies.

The contents of the DMAF were notstationary between the first andsecond match. There were a num-ber of problems with duplicateMAFIDs for different addresses andmultiple MAFIDs for the sameaddress in both DMAF matches.Sometimes administrative recordsthat matched the first time did notmatch the second. In these cases,if the original collection block wassplit into more than one tabulationblock, then the address was statis-

tically allocated to a tabulationblock. (The second DMAF matchalso had an impact on Top-downblock assignments.)

Bottom-up address selection andcomposite person records

For an address to be consideredeligible for Bottom-up selection,the following conditions had to bemet after the rematch to the DMAF:

1. The address had to have aCensus tabulation block code.

2. The address could not havebeen identified as a commercialaddress during the FAV.

3. The address had to be eithernon-FAV eligible, a FAV sampleaddress that was found to bevalid during field review,deemed valid based on FAVimputation, or valid based onmatching during the rematch tothe DMAF.

Once the pool of eligible addresseswas identified and linked to theunduplicated list of AREX persons,the address selection operationswere similar to the top down selec-tion and identification of personsin administrative records who wereeligible for the Bottom-up enumer-ation were similar to those proce-dures used for Top-down selection.Generally, all the addresses associ-ated with an individual wereassembled and subjected to theaddress selection rules to obtain

the “best” address for each individ-ual in the administrative recordsource files.

A possible outcome of the addressselection process was that no per-sons remained at valid addressesin the AREX test sites. That is,although one or more personswere originally associated with theadministrative record address, bestaddress selection resulted in allpersons at the address beingassigned to another test siteaddress or to an address outsidethe test site. These addresseswere designated as AREX vacantaddresses; there were 179,523such addresses.

Simulated NRFU – the Census Pull

A principle feature of the Bottom-up process was to complete theenumeration by adding persons attest site addresses not found inadministrative records.Presumably this would have beenaccomplished by some sort ofmailout/mailback procedure orface-to-face interviews or both.This can be considered as an ana-logue to a conventional nonre-sponse followup; albeit in thiscase, the “nonresponse” is to theinitial administrative records enu-meration.

For the AREX, the NRFU analoguewas simulated by including per-sons found in the Census 2000Hundred Percent Detail File (HDF)at addresses that were not foundamong the administrative recordsaddresses, occupied or vacant.These were persons enumerated inCensus 2000, and the assumptionwas that they would have beencounted in the AREX had some sortof followup been instituted. Theprocess of including persons fromthe Census 2000 HDF was referredto as the Census Pull.

Table 12.Bottom-up Administrative Records Address Processing

Test siteAddresses

sent toDMAF

computermatch

Matchedaddresses

after DMAFcomputer

match andclericalreview

Nonmatchedaddresses

FAV eligibleaddresses

Numberof valid

addresses ofthose eligible

for FAV(FAV sample

or imputed)

Maryland . . . . . . 656,273 543,881 112,392 96,202 36,661Colorado . . . . . . 531,382 459,753 71,629 57,333 23,310Total . . . . . . . . . 1,187,655 1,003,634 184,021 153,535 59,971

Page 31: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 21

Table 13 shows the number of

Census Pull addresses and persons

included in the Bottom-up enumer-

ation.

Revised race imputation for chil-

dren under 18

Instead of using the race model to

impute race for children under age

18 with unknown race in adminis-

trative records as was done in the

Top-down process, an alternative

imputation method was used. The

source of most children in the

AREX was the IRS 1040 file, which

generally provided primary and

secondary tax payers and up to

four dependents in each tax unit.

Children under 18 with unknown

race, who could be associated with

a tax unit, were assigned the race

of the primary taxpayer.

Bottom-up results

Overall enumeration

The AREX Bottom-up enumerationresults are shown in Table 14. Asexpected, the coverage is muchimproved compared to the Top-down counts, and is largely due tothe completion of the Top-downenumeration by the Census Pull.Specifically, the Bottom-up cover-age of children (81 percent - 94percent across the test sites) issubstantially better than the Top-down (72 percent - 83 percent).Adults in the Bottom-up are moreor less uniformly overcounted (102percent - 104 percent). The over-count of adults most likely is dueto unaccounted for deaths in theprevious 12 months, handling ofspecial populations, and failure tounduplicate persons after theCensus Pull (discussed later in thereport). Of course, the latter

means that there is some duplica-tion for the children as well.

Net effect of Bottom-up processeson administrative records tallies

Two of the Bottom-up operationsentailed an attempt to improve theadministrative records addressesprior to Bottom-up “best” addressselection: (1) the initial match tothe DMAF and its followup clericalreview, and (2) the FAV7. Theimpact of these operations on theadministrative records part of theBottom-up enumeration was three-fold. First, administrative recordsaddresses were removed from con-sideration if they did not match theDMAF, were FAV eligible but werenot found or deemed to be validby the FAV. Second, addresses thatwere not geocoded in the TIGERmatch or MAFGOR might havebeen coded through one or theother of these operations. Third,some of the addresses that weregeocoded prior to the initial DMAFmatch might have received codechanges.

The impact of these operations onthe addresses has been discussedin the relevant sections. Table 15provides some information on thenet impact of these operations onBottom-up person tallies fromadministrative records, and pro-vides a comparison of the Top-down and Bottom-up administra-tive records person tallies.

Of the 2.3 million persons talliedin the Top-down enumeration,70,031 (about 3 percent) wereexcluded from the Bottom-upadministrative record counts.These exclusions occurred eitherbecause the only address that thepersons had was rejected by theBottom-up processes or because

Table 13.Census Pull Results

State Census 2000Addresses

Census PullAddresses

Census PullPersons

Maryland . . . . . . . . . . . . . . . 615,323 97,460 185,868Colorado . . . . . . . . . . . . . . . 478,701 55,319 126,558Total . . . . . . . . . . . . . . . . . . . 1,094,204 152,779 312,426

Table 14.Bottom Up Method Population Tallies

Test Site County AREXPopulation

CensusPopulation

Percent ofCensus

Population

Baltimore City Maryland . . . . . . . . 661,561 651,154 102Under 18 . . . . . . . . . . . . . . . . . . . . . 151,411 161,353 9418 and over . . . . . . . . . . . . . . . . . . 510,109 489,801 104Baltimore County Maryland . . . . . 745,893 754,292 99Under 18 . . . . . . . . . . . . . . . . . . . . . 154,500 178,363 8718 and over . . . . . . . . . . . . . . . . . . 591,313 575,929 103Douglas County Colorado . . . . . . 170,102 175,766 97Under 18 . . . . . . . . . . . . . . . . . . . . . 46,394 55,477 8418 and over . . . . . . . . . . . . . . . . . . 123,689 120,289 103El Paso County Colorado . . . . . . 509,597 516,929 99Under 18 . . . . . . . . . . . . . . . . . . . . . 121,647 142,480 8518 and over . . . . . . . . . . . . . . . . . . 387,888 374,449 104Jefferson County Colorado . . . . . 508,254 527,056 96Under 18 . . . . . . . . . . . . . . . . . . . . . 108,618 133,486 8118 and over . . . . . . . . . . . . . . . . . . 399,575 393,570 102

7 The second match to the DMAF had animpact on address selection for both theTop-down and Bottom-up and should not beconsidered solely a Bottom-up operation.

Page 32: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

22 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

the only remaining addresses wereoutside of the AREX test sites.

For administrative records personsenumerated in both the Top-downand the Bottom-up, Table 15 pro-vides information on the change ingeographic location due to Bottom-up processes. Here, there seemsto have been very little impact;over 99 percent of these personswere at the same address in bothenumerations.

Overall, the net effect of theBottom-up operations on theadministrative record tallies wasquite modest.

Bottom-up evaluation

The Bottom-up evaluation focusedon both operations and the goalsof a decennial short form census.

AREX Bottom -up processing opera-tions

DMAF computer match

The computer match rate betweeneligible AREX addresses and theDMAF was only about 80 percent.A number of factors may have con-tributed to the match rate level.First, there is the vintage of theadministrative record addresses.Most of the AREX addresses wereof 1999 vintage, one year or olderthan the DMAF. Destruction ofhousing units and changes in offi-cial address components couldaccount for some of the non-matches.

Second, although a number ofadjustments were made to theAutoMatch parameters to try toensure optimum match rates, theclerical review following the com-puter match resulted in a substan-tial number of additional matchessuggesting that there is still roomfor improvement in the use ofmatching software. The fact thatmost of the unmatched addresseswere geocoded by TIGER or MAF-GOR suggests just how difficultaddress matching can be.

Third, a more consistent method ofaddress standardization shouldimprove the overall match rate.Throughout the course of creatingthe StARS database and subse-quent iterations of the AREXaddress file, the GeographyDivision’s address standardizer wasemployed. The dynamic nature ofthe standardizer software programand the flexibility of operator con-trol during its application mostlikely contributed to inconsisten-cies and variances that led to erro-neous non-matches (and matchesas well). Although difficult toquantify, the application of a fixedversion of the standardizer alongwith prescribed operator controlmethodologies should improve theoverall match rate during the com-puter matching operations.Improving the computer matchrate would, in turn, reduce thenumber of address records requir-ing clerical review.

Finally, multiple MAFIDs assignedto a single address and duplicateMAFIDs assigned to multipleaddresses contributed to the diffi-culty in classifying an address asmatched, non-matched, or possiblymatched. These difficulties maybe due to the Census Bureau’smethodology and audit trail foridentification and retention of “sur-viving MAFIDs” on the DMAF as theDMAF changes over time. Furtherresearch needs to be done on thebest formulation of DMAF extractsfor administrative record matching.

Second DMAF match

Prior to the AREX enumeration, asecond match to the DMAF wasrequired to pick up “tabulation”block codes. The block codesobtained from the original TIGERmatch and MAFGOR operationwere “collection” block codes. Thedifference between the codes isthat some collection blocks weresplit as part of a final decennialcensus-coding scheme. The AREXneeded to use the final blockcodes in order to facilitate compar-isons between AREX and Census2000 results. Addresses that didnot match the second time andwere in collection blocks that hadbeen split by one or more tabula-tion blocks were statistically allo-cated to one of the split blocks.

Clerical review of non-matchedAREX addresses

The original AREX plan called forPRED staff to do the clerical reviewof the unmatched and possible-matched records after the initialmatch of the administrativerecords addresses to the DMAF.However, resource constraints dueto changes in FAV plans requiredthat the review be shifted to theNational Processing Center (NPC).Accordingly, PRED trained approxi-mately 25 reviewers to evaluatethe possible matches of AREX

Table 15.Geographic Differences for Persons in Both the Top-Downand Bottom-Up Methods

Bottom-up

Top-down in

Bottom-upSame

AddressDifferentAddress

DifferentBlock

DifferentTract

DifferentCounty

Total . . . . . . . . . . . . . 2,275,456 2,258,441 17,015 15,129 11,847 2,363

Percent . . . . . . . . . . 100.0 99.3 0.8 0.7 0.5 0.1

Page 33: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 23

addresses against the DMAF andmake a match/non match determi-nation for the address.

Field Address Verification

To minimize the impact of the lackof experience, the listers were notused in the traditional role ofassigning action codes but ratherto collect information about theaddress for later analysis andassignment of the action code.The listers answered 11 questionsabout the property from which theaction code (called status code inthis operation) was later assigned.This modification worked well inminimizing the mistakes made byinexperienced staff and created acollateral benefit of collectingdetailed information about theaddresses for further research andanalysis.

One way to improve the list ofaddresses eligible for FAV is toimprove the identification andremoval of commercial addressesfrom the AREX address files. Theproduct used was the AmericanBusiness Information (ABI), Inc.database file of commercialaddresses (more than 10 million)based on national telephone direc-tories (both yellow and whitepages). Budgetary constraints pre-cluded purchase of the ABI residen-tial file. The use of both files(commercial and residential) wouldhave improved the accuracy ofcommercial address identificationand reduced the size of the FAV eli-gible address list as well.

It is difficult to gauge the impact ofthe FAV because the actual reviewwas carried out on a small sampleand because of the classificationerror associated with the imputa-tion based on the regression equa-tion. But there are some thingsthat can be learned from the FAVsample.

The sample addresses were drawnfrom a list that did not match anyaddress in the initial DMAF match.About 25 percent of the sampleaddresses found to be valid uponfield review were found to be validas listed. They represent about 10percent of all FAV eligible address-es. The remaining 75 percent ofvalid sample addresses (30 percentof all eligible addresses) werefound to be valid after lister cor-rections. It turned out that none ofthis group matched a Census 2000address in the second match to theDMAF nor, of course, did theuncorrected valid group. It is notknown whether any of theseaddresses truly represent address-es not in the DMAF or areunmatched as a result of inaccura-cies in the address matchingprocess.

Table 16 shows Bottom-up “best”administrative records addressesby FAV status.

It is interesting to note that FAVsample addresses selected as bestaddresses were much more likelyto be vacant than addresses thatwere not FAV eligible. The FAVimputed addresses had occupancyrates that were similar to the sam-ple. The number of persons count-ed at FAV sample addresses was2,162 and at FAV imputed address-es, 44,912 for a total of 47,074.Inflating the FAV sample personsby the reciprocal of the averageselection probability (i.e. 2,162(1/.

0433)) yields 49,931, much of thedifference presumably due toaddress misclassification as aresult of the imputation. However,the closeness of the numbers sug-gests that a 100 percent FAV wouldhave yielded results similar to thecombined sample and imputationscheme.

Including AREX vacant housing inthe Census Pull

The AREX address selection rulesresulted in almost 180,000 vacantaddresses thought to be valid forthe AREX test sites. Such address-es that are actually found in theAREX sites through a match to theCensus 2000 HDF would appear tobe conceptually similar to theaddresses included in the CensusPull. Both kinds of addresses rep-resent housing units in the AREXsites for which no administrativerecords persons were found to beresident. In both cases, it mighthave been that some addresseswere truly vacant on census dayand others truly occupied. For thelatter, deficiencies in the adminis-trative records or administrativerecords processing resulted in thepersons not being counted orcounted at the wrong address.

A match of the AREX vacantaddresses to the Census 2000 HDF,in fact, found about 76,000matched addresses, and almost67,000 were occupied in Census2000. (Refer to the analysis in

Table 16.Bottom-Up Best Address by FAV Status

FAV ValidImputed

ValidNot FAVEligible

Valid inSecond

DMAFMatch, Only Total

AREX Occupied. . . 1,084 24,703 855,946 3,775 885,508Percent . . . . . . . . . . 43 47 86 33 83AREX Vacant . . . . . 1,420 28,242 142,253 7,608 179,523Percent . . . . . . . . . . 57 53 14 67 17Total . . . . . . . . . . . . . 2,504 52,945 998,199 11,383 1,065,031Percent . . . . . . . . . . 100 100 100 100 100

Page 34: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Section 3 of this report.) Ofcourse, some of these persons maybe the same as persons counted atother administrative recordsaddresses; but the same could besaid for the persons found at theCensus Pull addresses. Therefore,in a Bottom-up process, both typesof addresses should have beencanvassed; and the AREX vacantaddresses that matched addressesin the Census 2000 HDF shouldhave been included in the CensusPull.

Unduplication after the Census Pull

There should have been an undu-plication of individuals after theCensus Pull by matching personsobtained in the Pull with thosefrom the administrative recordslists. The presence of duplicateindividuals is suggested not onlyby the overcounts of adults shownin various tables, but also by acomparison of the total number of

Bottom-up addresses with thenumber of Census 2000 addressesin the test sites. The total numberof addresses in the Bottom-up was1,217,810: 1,065,031 administra-tive record addresses and 152,779from the Census Pull. The numberof Census 2000 addresses in thetest sites was 1,094,204. Thus,there were 123,606 more address-es in the Bottom-up enumerationthan in Census 2000.

One way to accomplish the undu-plication would be to search andverify the SSNs for the individualsin the Census Pull and comparethem with the SSNs of the individu-als in the administrative recordslists. This might not be completelyeffective because being part of theCensus Pull suggests that blockingon address will not facilitate theSSN search. Alternatively, theCensus Pull individuals could bematched directly with the adminis-

trative record list blocking various-ly on such variables as surnameand date of birth.

When duplicate individuals werefound, the Census Pull could betaken as more accurate and theindividuals would be removedfrom the administrative recordsaddress. This approach couldresult in some additional vacantaddresses, so that the processmight have to be repeated severaltimes in order to identify the “best”address for all persons. In theend, there could be vacant admin-istrative record addresses thatshould have been filled by personserroneously located outside of theAREX sites in the administrativerecords systems. This would implythat a national unduplicationwould be part of a full Bottom-upcensus. Such an unduplicationwas out of scope for the experi-ment.

24 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Page 35: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 25

3.1 Introduction

The evaluation of the numericalfindings of the AREX was twofold.First there was a comparison of theresults of the Top-down andBottom-up enumerations with theCensus 2000 enumeration in theexperimental test sites (Heimovitz2002). This analysis progressedfrom large geographic areas tosmall geographic areas, beginningwith the five test site counties andending with Census 2000 blockswithin the sites. The outcomesevaluation tried to disentangle theinfluence of demographic changeand AREX processing, coverageand data quality issues, while pre-senting basic enumeration statis-tics. Below the county level, theanalysis focused on the Bottom-upenumeration because the county-level analysis was sufficient toshow the evident weaknesses ofthe Top-down process. Section 3.2provides some of the highlights ofportions of Heimovitz’s report;there was also a regression analy-sis that is omitted here.

The primary goal of the secondevaluation was to assess the accu-racy of households assembledfrom administrative records bycomparing them to Census 2000enumeration results at the sameaddresses (Judson and Bauder,2002). This was a particularlyimportant analysis for the type ofdesign that the AREX mountedbecause the completion of anadministrative records enumerationby canvassing addresses not foundin the records provides little oppor-tunity to correct enumerationerrors in the administrative records

themselves. Thus, it was impor-tant to learn as much as possibleabout the strengths and weakness-es in the administrative recordshouseholds with an eye towardfuture improvements.

In the course of the household-level analysis, some preliminaryinformation about a possible useof administrative records in a con-ventional census was obtained.The question of interest was:Under what conditions can admin-istrative records households besubstituted for conventionalNonresponse Followup (NRFU)households, or households forwhich occupancy status andhousehold demographics werewholly imputed (“unclassified”households)? This assessment wascarried out by matching the demo-graphic composition of AREXhouseholds to Census 2000 house-holds which were difficult to enu-merate in Census 2000. In addi-tion to a descriptive analysis, therewas a prediction-based approachto assess the ability to predictwhen an AREX household is likelyto demographically match a censushousehold. Section 3.3 provides asummary of the results in thereport by Judson and Bauder.

3.2 Administrative RecordsExperiment enumerationoutcomes

Methodology

Concept

The enumeration outcomes analy-sis provides measures of how wellAREX replicates Census 2000results at county and subcounty

levels focusing on key demograph-ic characteristics that are importantfor decennial census requirementsbut also relate to the possible useof administrative records for inter-censal and small-area estimation.A series of research questions pro-vides a conceptual outline of thebasic elements of the evaluation.General questions at larger geogra-phies are posed first:

• How well does AREX measuretotal census population at thecounty level, and how do theresults differ by whether theTop-down or Bottom-upapproaches were used?

• How do county-level differencesbetween AREX and census differby age, race, sex, and Hispanicorigin, as well as between theTop-down or Bottom-upapproaches?

A related question, how well doesAREX measure the voting age pop-ulation (age 18+) of state legisla-tive districts, is discussed inHeimovitz, 2002.

In a decennial census, total popula-tion counts are needed for con-gressional apportionment. Thevoting age (18+) population byrace and Hispanic origin potentiallymeets the data requirements forlegislative redistricting. Populationcounts of persons under age 18are needed by states for planningpurposes and estimating childpoverty rates. Greater differencesbetween AREX and census countsare more likely at smaller geogra-phies. But focusing on smallergeographies allows more detailed

3. Administrative Records ExperimentOutcomes and Household Evaluations

Page 36: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

analyses of neighborhood charac-teristics and whether these attrib-utes are linked with AREX-Census2000 differences: In particular, howdoes the accuracy of tract andblock counts compare to countyresults?

Outcome measures

The terms ‘undercount’ and ‘over-count’ describe how well AREXcounts match Census 2000 resultsand have no further connotation.That is, undercounts and over-counts reflect any of several prob-lems, including coverage issues,coding, and processing errors.Outcome and predictor constructsare distinguished and used to high-light AREX-Census 2000 Bottom-upand Top-down differences. Theoutcome measures used in thisconsolidated report are limited tothe simple count differencesbetween AREX and Census 2000counts and to the algebraic percenterror (ALPE). The full outcomesanalysis (Heimovitz 2002) providesadditional measures.

Difference

The simple difference betweenAREX and Census 2000 gauges thecounty-level over and under-counts:

where:

Ai = AREX tallies in county

Ci = Decennial census tallies incounty

Algebraic percent error (ALPE)

AREX and Census 2000 counts arethe inputs for calculating the alge-braic percent error for the ithcounty, tract, or block:

Where:

Ai = AREX tallies in the ith county,tract, or block; and

Ci = Decennial census tallies inthe ith county, tract, or block

Two problems can occur whencomputing ALPEs: zero blocks andinflated ALPEs. Zero blocks occurwhen AREX reports in a particularblock at least one person having aparticular characteristic but censusdoes not. Because Census 2000 isbeing used as the standard and isthe denominator, ALPEs for zeroblocks are undefined. For the pur-pose of block comparisons, zeroblocks are omitted from the analy-ses. However, county and tract-level counts and comparisonsinclude these blocks because theyare aggregated at larger geogra-phies.

Inflated ALPEs can sometimesoccur when Census 2000 blockshave very small counts and tend toproduce large, positive ALPEs,despite small differences betweenAREX and Census 2000 counts.For example, C=1 and A=3 yieldsan ALPE=2. Such a large ALPE isquite unlikely when the size ofC–the number of persons enumer-ated in the census area–is large.Small census counts are not unlike-ly, for example, for racial minori-ties in sparsely populated areas.To reduce the impact of unusuallylarge ALPEs, ALPEs were trimmed(topcoded) by setting all valuesgreater than the 95th percentile ofthe ALPEs across the areas in theanalysis to the value of the 95thpercentile. Still, care should betaken in interpreting results forthose analyses where the popula-tion is sparsely populated withinthe geographic units of interest.

There is an additional problemwhen computing differences orALPEs for racial subpopulations.

The problem stems from the differ-ences between AREX and Census2000 classifications. Both AREXand Census 2000 have the fourtraditional categories: White,Black, American Indian/AlaskanNative, and Asian/Pacific Islander8.But Census 2000 permits respon-dents two additional options:“multiple race,” and “Other race.”These additional groups were quitesmall for Maryland. For Colorado,the multi and other race groupswere much larger, encompassingmore than 8 percent of the Census2000 population for El PasoCounty. In the following outcomesanalysis, no attempt was made todistribute either additional racecategory across the four commoncategories. Excluding Census 2000respondents with multi or otherrace could result in positive differ-ences and ALPEs for race sub-groups, especially for minoritygroups, that might not haveoccurred had the AREX and censusclassifications been the same.

Descriptive analyses

This section is intended to be atop-level, descriptive summary ofAREX-Census 2000 differences, bycounty, tract, and block. County-level counts and proportions arecompared and display the raw,untransformed numbers not shownin the multivariate analyses. Thecount differences describe theaggregate under- and over-countsof age, race, sex, and Hispanic ori-gin categories, while the ALPEsshow the contribution these cate-gories have on the under- andovercounts. One important aspectof the bivariate analyses is the eco-logical variation within the AREXcounties. Thematic maps profile the heterogeneous AREX-Census

ALPE(Ai,Ci) = Ai - Ci

Ci

DIFF(Ai,Ci) = Ai - Ci

26 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

8 Multiple Census 2000 categories werecombined for Asian/Pacific Islander.

Page 37: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

2000 differences in block-leveltotal population counts.

AREX Top-down counts include per-sons later identified in Bottom-up as group quarters residents;

Bottom-up and Census 2000counts exclude group quarters res-idents and differ somewhat fromcounts in earlier tables for whichthere were no exclusions.

County-level count results

Total population

Total population results for the twoMaryland counties and threeColorado counties are reported inTable 17.

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 27

Table 17.Top-down and Bottom-up Counts of Total Household Population by County

Top-down Results Bottom-up Results

AREX Census DifferenceALPE

(percent) AREX Census DifferenceALPE

(percent)

Baltimore County . . . . . . . . . . . . . . . . . . 696,183 736,652 –40,469 –5.5 728,205 736,652 –8,447 –1.1Baltimore City . . . . . . . . . . . . . . . . . . . . . 570,648 625,401 –54,753 –8.8 636,729 625,401 +11,328 +1.8Douglas County . . . . . . . . . . . . . . . . . . . 148,270 175,300 –27,030 –15.4 169,640 175,300 –5,660 –3.2El Paso County. . . . . . . . . . . . . . . . . . . . 456,891 501,533 –44,642 –8.9 494,253 501,533 –7,280 –1.5Jefferson County . . . . . . . . . . . . . . . . . . 473,495 519,326 –45,831 –8.8 503,622 519,326 –15,704 –3.0

Figure 2.Net Population Difference by Sex, County, and Collection Method—CO

-60000

-40000

-20000

0

20000

40000

60000

Douglas T/D Douglas B/U El Paso T/D El Paso B/U Jefferson T/D Jefferson B/U

County and collection method

Number of personsMaleFemale

Page 38: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

AREX undercounted all five coun-ties in the Top-down and four offive counties in Bottom-up. Thegreatest Top-down differenceswere in Baltimore City andJefferson County. Bottom-upundercounts are much smaller thanTop-down undercounts in all fivecounties for total population anddemographic characteristics.

Sex

Males and females are undercount-ed by the Top-down method in all

five counties. Bottom-up under-counts are much smaller for allcounties, and males are overcount-ed in Baltimore City. (BaltimoreCTY is Baltimore County.)

Age

In the Maryland counties, Top-down overcounts the 75+ popula-tion and undercounts other agegroups; Bottom-up overcounts the20-44, and 65+ age groups andundercounts all other age groups.In both Maryland and Colorado,

Top-down undercounts are greatestfor the 0-19 age groups and showthe greatest improvements forBottom-up counts relative to Top-down. In the Colorado counties,generally, age 20-24 and 65+ agegroups are overcounted and otherage groups are undercounted forboth Top-down and Bottom-upmethods.

28 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

County and collection method

Number of personsMaleFemale

Figure 3Net Population Difference by Sex, County, and Collection Method—MD

-60000

-40000

-20000

0

20000

40000

60000

Balt. Cty T/D Balt. Cty B/U Baltimore T/D Baltimore B/U

Page 39: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 29

County and collection method

Number of persons

Figure 4. Net Population Difference by Age, County, and Collection Method—MD

-60000

-40000

-20000

0

20000

40000

60000

Balt. Cty T/D Balt. Cty B/U Baltimore T/D Baltimore B/U

Age 0-4Age 5-19Age 20-24Age 25-34Age 35-44Age 45-54Age 55-64Age 65-74Age 75-84Age 85+

0-4

85+

0-4

85+

0-4

85+

0-4

85+

Page 40: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Race

In the Maryland counties, Hispanicswere overcounted and other minor-ity race groups were generally

undercounted in Top-down andBottom-up. In the Bottom-upmethod, Whites and Blacks wereovercounted in Baltimore Citywhere Blacks are a majority. In the

Colorado counties, Blacks and APIswere generally overcounted whileother race categories andHispanics were undercounted inTop-down and Bottom-up methods.

30 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Figure 5.Net Population Difference by Age, County, and Collection Method—CO

-60000

-40000

-20000

0

20000

40000

60000

Douglas T/D Douglas B/U El Paso T/D El Paso B/U Jefferson T/D Jefferson B/U

County and collection method

Number of persons

Age 0-4Age 5-19Age 20-24Age 25-34Age 35-44Age 45-54Age 55-64Age 65-74Age 75-84Age 85+

0-4

85+

0-4

85+

0-4

85+

0-4

85+

0-4

85+

0-4

85+

Page 41: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 31

County and collection method

Number of persons

Figure 6.Net Population Difference by Race, County, and Collection Method—MD

-60000

-40000

-20000

0

20000

40000

60000

Balt. Cty T/D Balt. Cty B/U Baltimore T/D Baltimore B/U

White

Black

AI API Hispan

ic

White

Black

AI API Hispan

ic

White

Black

AI APIHisp

anic

White

Black

AI API Hispan

ic

Page 42: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

One general pattern from tablesand figures above is the relation-ship between population sharesand AREX under- and overcount.Race and Hispanic origin groupswith smaller shares tend to beovercounted, and groups with larg-er shares tend to be undercounted.Examples of overcounts areHispanics in the Maryland counties,Whites in Baltimore City, and Blacksand APIs in the CO counties.

The very large Top-down under-count of Blacks in Baltimore City isdue largely to the inappropriateuse of the race model for childrenin the Top-down process. The

Black count changes dramatically

in the Bottom-up in which children

with unknown race are generally

assigned the race of the primary

taxpayer.

County-level ALPE results

The county-level analysis builds on

the AREX-Census 2000 count

results by examining the algebraic

percent error (ALPE). The ALPE

measure provides a different view

of the county-level results because

the calculation method uses cen-

sus group totals as bases and pro-

vides a standardized gauge for

comparing differences between

Top-down and Bottom-up, as wellas between counties.

Total population

All county Bottom-up ALPEs weresmaller than Top-down ALPEs;Bottom-up ALPE improvementswere variable. Both DouglasCounty and Baltimore City had Top-down ALPEs of -8.8 percent, butBottom-up for Douglas County was-3.2 percent compared to +1.8 per-cent for Baltimore City. The small-est total population Bottom-upALPE was in Baltimore County (-1.1 percent); the largest Bottom-up ALPE was in Douglas County (-3.2 percent).

32 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

County and collection method

Number of persons

Figure 7. Net Population Difference by Race, County, and Collection Method—CO

-60000

-40000

-20000

0

20000

40000

60000

Douglas T/D Douglas B/U El Paso T/D El Paso B/U Jefferson T/D Jefferson B/U

White Bla

ckAI APIHisp

anic

White

Black

AIAPIHisp

anic

White Bla

ckAI APIHisp

anic

White

Black

AI APIHisp

anic

White Bla

ckAIAPIHisp

anic White

Black

AI APIHisp

anic

Page 43: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Sex

Male and female Bottom-up ALPEs

were relatively small in all five

counties and ranged from – 4.8 to

+ 4.2 percent.

Sex proportions were undercount-

ed in all counties (except Baltimore

City males) and generally are unbi-

ased, reflecting the magnitude oftotal county-level proportions.Female undercounts were slightlyworse than male undercounts andgenerally had a marginal differenceof less than 2 percent in Bottom-up. Some women may be lessactive within the administrativerecords systems. For example,some studies indicate that lifetime

participation in the labor force

varies by a woman’s child raising

and care giving experiences, health

status, and race/ethnicity (Flippen

and Tienda, 2000). However,

lower mortality rates for women

might offset lower labor force par-

ticipation with respect to AREX/

Census 2000 comparisons.

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 33

County and collection method

MaleFemale

Figure 8. Sex ALPE by County and Collection Method—MD

-0.500

-0.400

-0.300

-0.200

-0.100

0.000

0.100

0.200

0.300

0.400

0.500

Balt. Cty T/D Balt. Cty B/U Baltimore T/D Baltimore B/U

ALPE

Page 44: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Age

Generally, younger age groups

(especially the 0-4 age group) had

the largest negative ALPEs in all

five counties. Bottom-up ALPEs for

the 0-4 age group ranged from–33.9 percent in Jefferson Countyto –23.4 percent in Baltimore City.Older age groups (65-74, 75-84,and 85+) tended to have positiveALPEs that increased by age.

Bottom-up ALPEs were generally

smaller due to the Census-pull

households that replaced

unmatched Census 2000

addresses.

34 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

County and collection method

MaleFemaleALPE

Figure 9: Sex ALPE by County and Collection Method—CO

-0.500

-0.400

-0.300

-0.200

-0.100

0.000

0.100

0.200

0.300

0.400

0.500

Douglas T/D Douglas B/U El Paso T/D El Paso B/U Jefferson T/D Jefferson B/U

Page 45: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 35

County and collection method

Figure 10.Age ALPE by County and Collection Method—MD

-0.500

-0.400

-0.300

-0.200

-0.100

0.000

0.100

0.200

0.300

0.400

0.500

Balt. Cty T/D Balt. Cty B/U Baltimore T/D Baltimore B/U

ALPE

Age 0-4Age 5-19Age 20-24Age 25-34Age 35-44Age 45-54Age 55-64Age 65-74Age 75-84Age 85+

0-4

0-4

0-4

85+

0-4

85+ 85+85+

Page 46: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

The large negative ALPEs for chil-dren and the large positive ALPESfor older groups are due mostly tothe weaknesses in the administra-tive records discussed in Section 2:missing births and deaths andmigration as a result of the cutoffdates of the administrative recordfiles used in the AREX, missingdependents on IRS 1040s, andmissing children of parents whodid not have to file 1040s or wereotherwise not found in the admin-istrative records. Persons aged65+ were generally overcounted inall five counties, and persons age

85+ displayed Bottom-up over-counts ranging from about 2 per-cent to 36 percent–77 percent inless-populated Douglas County.Because the 85+ population is rela-tively small, the denominators ofthe ALPE calculations are likely tobe small and potentially inflateALPE measures.

The 20-24 year age group also haslarge positive ALPEs in some of theAREX counties. This might be dueto the handling of special popula-tions to which this age groupbelongs: college and universitypopulation, and the military.

College-age persons whose resi-dence may have been reported at aparent’s IRS tax address may actu-ally reside on a campus in a differ-ent area. Removing group quar-ters from the Census 2000 countsbut not from the Top-down countswould bias Top-down ALPEs in thepositive direction. Removinggroup quarters from the Bottom-upcounts would still leave depend-ents claimed on IRS 1040 at thewrong location with respect todecennial residency rules.

Douglas County appears to be aspecial case. The Census 2000

36 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

County and collection method

ALPE

Figure 11: Age ALPE by County and Collection Method-CO

-0.500

-0.400

-0.300

-0.200

-0.100

0.000

0.100

0.200

0.300

0.400

0.500

Douglas T/D Douglas B/U El Paso T/D El Paso B/U Jefferson T/D Jefferson B/U

Age 0-4Age 5-19Age 20-24Age 25-34Age 35-44Age 45-54Age 55-64Age 65-74Age 75-84Age 85+

0-4

85+

0-4

85+ 85+

0-4

85+

0-4

85+

0-4

85+

0-4

Page 47: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

population age 20-24 is 3.1 per-cent, less than half that ofColorado (7.1 percent) and thenational average (6.7 percent). Butthe Air Force Academy and severalother schools are located inDouglas County. The large Top-down ALPE may be due to the factthat group quarters were notremoved from the administrativerecords. Although there was anattempt to remove group quartersfrom Bottom-up enumeration, thelarge Bottom-up ALPE for age 20-24 suggests this may not havebeen fully successful.

Race

It is difficult to interpret Top-downrace ALPEs because of the con-founding effects of general under-counts, especially for children, andthe use of the race model for chil-dren under 15 with “other” orunknown race in the administrativerecords. The following discussionwill focus on the Bottom-upresults.

There are a number of reasons forthe patterns of Bottom-up race andHispanic origin ALPEs. First is theuse of the race model. As dis-

cussed earlier, the race model was

a national-level model and varia-

tion about its predictions can be

expected. The use of the model in

small geographic areas would tend

to overstate the number of persons

in those race groups that are less

than the national average and

understate the number of persons

in groups that are above the

national average. When modeled

race was assigned to children from

an adult in the same household,

the result would be reinforced.

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 37

County and collection method

Figure 12.Race ALPE by County and Collection Method—MD

-0.500

-0.400

-0.300

-0.200

-0.100

0.000

0.100

0.200

0.300

0.400

0.500

Balt. Cty T/D Balt. Cty B/U Baltimore T/D Baltimore B/U

ALPE

WhiteBlackAIAPIHispanic

White

Black

AI API

Hispan

ic

White

Black

AI API

Hispan

ic

White

Black

AI API Hispan

ic

Black

AI API

Hispan

ic

White

Page 48: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

It is important to keep in mind thatthe race model was used for onlythose adults whose administrativerecords did not provide a raceother than “other” or unknown.Table 3 in Section 2.2 shows theproportion of adults and childrenwith imputed race in each of thetest sites. The number of adultswith imputed race ranged fromabout 3 to 9 percent and was sub-stantially lower in Maryland thanColorado. For Hispanic origin, theimputation was used well over 90percent of the time because, forthe most part, administrativerecords provide no direct measureof ethnicity.

Other factors possibly affectingBottom-up ALPE race patternswere: The possible correlationbetween weaknesses in AREX pop-ulation coverage and race orHispanic origin, unaccounted formigration and demographicchanges due to the age of theadministrative records files, thepossible duplication of personsdue to the Census Pull, the prob-lem of comparing AREX andCensus 2000 race groups becausethe latter allows “multi” and “other”and the former does not, and thepositive ALPE bias for cells withsmall denominators. Examiningseveral of the race ALPE results

shows the complexity of the possi-ble explanations.

For the Bottom-up, Black ALPEswere positive in all three Coloradocounties and Baltimore City andnegative in Baltimore County(where blacks are a large minorityrace group). The overcount ofBlacks in Colorado was most likelydue to the race model because theproportion of Blacks in Coloradowas much smaller than the nation-al average; and at the same time,the proportion of adults inColorado with imputed race wasrelatively high, ranging from 6 to 9percent. The undercount of Blacks

38 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

County and collection method

ALPE

Figure 13.Race ALPE by County and Collection Method-CO

-0.500

-0.400

-0.300

-0.200

-0.100

0.000

0.100

0.200

0.300

0.400

0.500

Douglas T/D Douglas B/U El Paso T/D El Paso B/U Jefferson T/D Jefferson B/U

WhiteBlackAIAPIHispanic

Whi

te

Blac

kA

IA

PIH

isp

anic

Whi

te

AI

API

His

pan

ic

Whi

te

AI

API

His

pan

ic

Blac

kA

I

His

pan

ic

Whi

te

Blac

kA

I API

His

pan

ic

Whi

te

AI

API

His

pan

ic

Blac

k

Blac

k

Whi

te

API

Blac

k

Page 49: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

in Baltimore County might be duein part to the use of the racemodel; but it might also be due inpart to the migration of Blacksfrom Baltimore City to the Countyin the period between the adminis-trative records cutoffs and April 1,2000. The reasons for the over-count of Blacks in Baltimore Cityare less clear but might also bedue to unaccounted for migrationof Blacks from the city. An over-count is the reverse of what wouldbe expected if the race model werethe main cause, and the proportionof adults with modeled race wasunder 3 percent.

The ALPEs for APIs were positive inall three Colorado counties andBaltimore City and negative inBaltimore County. This wouldappear again to be a race model

effect, except for Baltimore County,because nationally, all five countieshave API proportions below thenational average. Evidently, thenet effect of these differencesincreased the size of the Census2000 API counts enough so thatAPI ALPEs for all of the AREX siteswould have been negative hadthey been calculated from thesedistributions. In any case, it issimply a matter that ALPE is sensi-tive for small population sub-groups.

Concerning APIs, the substantialnegative ALPEs in all counties werenot unexpected. Identifying AIANrace is weak in the administrativerecords, except in areas aroundreservations, and AIAN predictionwas the weakest part of the racemodel as well.

Hispanic ALPEs were positive inboth Maryland counties where theyare a small minority group andnegative in all three Coloradocounties where Hispanics are thelargest minority group. The modelfor Hispanic origin was applied toabout 97 percent of adults inMaryland and is most likely thereasons for the substantial over-counts there. Again, one mighthave also expected small over-counts in Colorado were model usethe main factor. (See the discus-sion of Hispanics in Section 2.) Butthe substantial undercounts sug-gest that other factors may be atwork such as high birth rates andnet in-migration of Hispanics in theperiod missed by the administra-tive records used in the AREX.

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 39

Percent of tracts

Figure 14.Distribution of Tracts with Under- and Overcounts of Total Population

0

20

40

60

80

100

BaltimoreCounty

Baltimore Douglas El Paso Jefferson

County

ALPEs

50% or more25% to 49%5% to 24%-5% to 4%-25% to -6%-50% to -26%51% or less

Page 50: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Tract ALPE distributions

Figure 14 shows the ALPE distribu-tions for the five AREX counties. Inall sites other than Baltimore City,more than 70 percent of tracts hadAREX total population counts with-in +/-5 percent of census results,and more than 95 percent of tractshad counts within 25 percent ofcensus results. Baltimore City hadless accurate results with about 50percent of tracts exceeding +/-5percent of census results. A largerproportion of tracts had moderateand large ALPE undercounts (lessthan –5 percent) compared to over-counts.

Though the tract-level ALPEs forthe total population resemblecounty-level results, the distribu-tions indicate more Baltimore City

tracts were overcounted. It isunclear whether these overcountsare related to persons who wereactually uncounted in the census,or more likely, weaknesses in AREXprocessing. Households may havebeen added through the CensusPull process that replacedunmatched addresses that existedin other tracts or addresses.

Block ALPE distributions

The block-level ALPE resultsdescribe the accuracy of counts atthe smallest geographic level andrelative to counties and tracts.The main problem with this type ofcomparison is the ALPE denomina-tor potentially inflates block-levelALPEs for small population sub-groups and especially minorities.This inflation is likely to be greater

than found in the tract-countycomparisons. A second issueaffecting comparisons is the exclu-sion of blocks where Census 2000did not identify persons with a par-ticular attribute (zero blocks).County and tract ALPEs includeblocks with zero counts becausethese blocks were accumulatedinto larger geographies. However,the block-level ALPEs used thereduced set of blocks and theresults may be quite differentwhen comparing the ALPEs at vari-ous geographies.

AREX was less accurate in estimat-ing blocks than tracts in all coun-ties. Population totals for 18 to 39percent of blocks were within 5percent of Census 2000, and about85 percent were within 25 percentof the census. Douglas County

40 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Percent of blocks

0

20

40

60

80

100

BaltimoreCounty

Baltimore Douglas El Paso Jefferson

County

ALPEs

Figure 15.Distribution of Blocks with Under- and Overcounts of Total Population

50% or more25% to 49%5% to 24%-5% to 4%-25% to -6%-50% to -26%51% or less

Page 51: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

had the best results at the 5 per-cent criterion and Baltimore Countywas best at the 25 percent criteri-on. In the Maryland counties,slightly more blocks had moderateor large overcounts (ALPEs exceed-ing 5 percent), compared to theColorado counties where moreblocks had moderate undercounts(-5 to -24 percent).

The AREX counts were less accu-rate at the block-level. Populationcounts are likely to be less accu-rate in smaller areas due to incor-rect assignment of households attracts and blocks that average outfor county-level counts. This isdemonstrated by the greater num-ber of moderate and large ALPEsand indicates how smaller denomi-

nators and AREX processing weak-nesses influenced the compar-isons. Though zero blocks wereexcluded and fewer blocks met the5 percent criterion, a large propor-tion of blocks met the 25 percentcriterion in all five counties.

Geospatial tract-level heterogeneity

Figure 16 and 17 exhibit the geo-graphic distribution of AREX-Census 2000 tract ALPEs for totalpopulation counts. Baltimore Cityis the nucleus of the MarylandAREX site (Figure 16) and the mosturban of all the sites. It hasnumerous tracts with large underand overcounts. The tract-leveltotal population was clearly meas-ured better in Baltimore County.

There is also evidence of tractsclustering by size of under andovercounts. Downtown Baltimoreand Towson include islands ofmoderate and large undercounts,while clustered moderate over-counts are more frequent in otherparts of the City and County.Denver, in the north, and ColoradoSprings are metropolitan centers inthe Colorado site (Figure 17).Generally, the tract-level CO popu-lation was counted more accurate-ly in the suburbs of each city,while urban and rural tracts tendedto have moderate undercounts.

Summary and conclusions

The forgoing analysis providedmeasures of how well AREX

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 41

Tgr24005trt00jun29 by pdtotal

Total Population ALPEs-Baltimore County

-0.38 to -0.205 (2)-0.205 to -0.055 (20)-0.055 to 0.045 (159)0.045 to 0.205 (21)0.205 to 1 (2)

Tgr24510trt00jun29 by pdtotal

Total Population ALPEs-Baltimore City

-0.205 to -0.055 (27)-0.055 to 0.045 (102)0.045 to 0.205 (68)0.205 to 1.1 (3)

Figure 16. AREX - Census Total Population ALPEs: Maryland Tracts

Page 52: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

replicated Census 2000 results atseveral geographic levels focusingon key demographic characteristicsthat are important for a decennialcensus. As expected, the Bottom-up method performed better thanthe Top-down method because ofthe simulated canvassing of house-holds (the Census Pull) at address-es not found in the administrativerecords. The Bottom-up processundercounted total population inall sites except Baltimore City.Algebraic percent errors for coun-ty-level population totals were lessthan 5 percent though the resultswere not as good for subcountyand demographic subgroups.

If the Bottom-up process wereunbiased and counted all

demographic groups in the sameway, ALPEs for all demographic cat-egories would have had the samerelative size. As with the totalpopulation, males and femaleswere undercounted in all sitesexcept Baltimore City, but thefemale undercounts were slightlygreater than male undercounts.Age group ALPEs show more vari-ability with most groups under-counted except the 20-24 groupand the oldest age groups.Generally the size of the under-counts increased with decreasingage. These patterns did notappear to be site-specific and arethe result of the weaknesses of theadministrative records and certainAREX processing decisions as

discussed in Section 2.Overcounts for the oldest old andundercounts for the youngest per-sons suggest that much more time-ly birth and death informationmust be obtained. And the specialenumeration requirements for pop-ulations such as college students,the military and persons in nursinghomes must be incorporated intoadministrative records processes.

Bottom-up tract-level total popula-tion ALPE results indicated a goodcorrespondence between AREX andCensus 2000 (70 percent of tractsmet the 5 percent criterion; and 95percent met the 25 percent criteri-on), though a sizable number oftracts had moderate and largeALPE undercounts. The block-level

42 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Figure 17. AREX - Census ALPEs for the Total Population: Colorado Tracts

Tgr08035trt00jun29 by pdtotal

Total Population ALPEs-Douglas County

-0.205 to -0.055 (6)-0.055 to 0.045 (32) 0.045 to 0.205 (1)

Tgr08041trt00jun29 by pdtotal

Total Population ALPEs-El Paso County

0.045 to 0.205 (4)-0.055 to 0.045 (99)-0.205 to -0.055 (8)

Tgr08059trt00jun29 by pdtotal

Total Population ALPEs-Jefferson County

-0.205 to -0.055 (24)-0.055 to 0.045 (107) 0.045 to 0.205 (2)

Page 53: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

ALPE results provided the leastaccurate measure of total popula-tion (38 percent of blocks met the5 percent criterion; and about 85percent met the 25 percent criteri-on), compared to tract and countyresults9.

The regression results confirmsome of the key findings from theunivariate and bivariate analyses.Among the mobility variables, bothvacancy rate and rental rate wereassociated with under and over-counts. Generally, rental rate hada greater association with under-counts and vacancy rates had agreater association with over-counts in both AREX sites. Asobserved in the bivariate analyses,large proportions of persons underage 5 and 20-24 were associatedwith undercounts in both sites.And in CO, large proportions ofpersons age 65+ were associatedwith overcounts, other factors heldconstant (Heimovitz, 2002).

3.3 Household-levelanalysis

Methodology

Concept

The general goal of the household-level analysis (Judson and Bauder,2002) was to assess how wellhouseholds formed from adminis-trative records matched those fromCensus 2000 at the same address-es in the Hundred Percent Detail

(HDF) file. The analysis did notinclude group quarters or thehouseholds found at addresses notin the administrative records files.An assessment of group quarterswas beyond the scope of thisanalysis because AREX did notmount the operations that wouldhave been needed to enumeratespecial populations in an adminis-trative records census context.And, the Bottom-up “NRFU” house-holds could not be evaluatedbecause the canvassing was simu-lated by simply including theCensus 2000 households at the rel-evant addresses.

The household-level analysisassessed the ability of AREXadministrative records householdsto match the demographic compo-sition of all households, but therewas a special focus on Census2000 households that required anonresponse followup and Census2000 unclassified households. InCensus 2000, addresses that didnot respond to the mailout had tobe enumerated by nonresponsefollowup procedures. NRFUaddresses are the most expensiveto enumerate and may representthe most vulnerable segment ofAmericans. The household-levelanalysis provided a preliminarylook at the conditions under whichhouseholds formed from adminis-trative records could be used forconventional NRFU households,obviating the need for fieldwork inthose cases.

Addresses that had the status“unclassified” in Census 2000 werethose for which so little informa-tion was available that occupancystatus had to be imputed, and,conditional on being imputed“occupied,” the entire household,including characteristics, had to beimputed as well. This treatment ofunclassified households was thesubject of a lawsuit reaching the

U.S. Supreme Court (Utah v. Evans),in which the plaintiffs objected tothe imputation substituting forenumeration. Although the censusmethodology prevailed, the possi-bility of enumerating these typesof addresses by administrativerecords might provide a usefulalternative to traditional imputa-tion. This section provides someinformation comparing administra-tive records enumeration andCensus 2000 imputations for theCensus 2000 unclassified house-holds in the AREX test sites.

Special terminology

For this section, the term “censushousehold” refers to the personsenumerated at an address inCensus 2000. The term “AREXhousehold” refers to persons at anadministrative records address.“Household size” refers to thenumber of people in the housingunit. For convenience, these defi-nitions are applied to vacant hous-ing units, so that when a Censusor AREX address contains no peo-ple, the housing unit is assigned ahousehold size of zero. We usethe term “imputed household” forunclassified addresses whose occu-pancy status and household char-acteristics have been imputed.

Pairs of addresses (AREX andCensus) that were matched bycomputer or clerical processes arereferred to as “linked” housingunits. The term “linked house-holds” is used when comparing theproperties of people within linkedhousing units. The term “demo-graphic match” is used when twohouseholds have the same age,race, sex, and Hispanic origin dis-tribution.

Finally, the term “AREX data” isused for administrative dataobtained from the Bottom-up oper-ations (i.e., including DMAF link-age, clerical review and FAV). The

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 43

9 From data not shown in this report(but available from Heimovitz, 2002) ALPEresults for sex and age were similar for tractand county analyses. Baltimore City had theworst results for total and demographic ALPEmeasures but the most accurate results forblacks. However, Baltimore City also had thelargest proportion of census pull records andsmallest proportion of imputed black racecodes. For the race/Hispanic minoritygroups, the relative size of the minority pop-ulation in the tract was associated with howwell AREX simulated Census results. Tractswith small minority proportions were morelikely to have moderate or large positiveALPEs than other tracts.

Page 54: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

term “Census data” is used for dataobtained from the Census 2000HDF file.

Descriptive analysis

The household-level evaluationused both descriptive analyses andmultiple regression analysis toassess the coverage and accuracyof AREX households. Descriptiveanalyses were performed for linkedhouseholds in all five AREX coun-ties and for the Census 2000 NRFUand imputed households in thetest sites. These analyses provid-ed the following evaluations:

• Coverage by AREX of its intend-ed universe by determining thenumber and proportion ofCensus 2000 addresses thatwere matched by AREX address-es;

• Characteristics of Census 2000households associated withAREX/Census matched address-es;

• Comparison of AREX and Census2000 distributions of householdsize and household demograph-ic characteristics;

• Characteristics of AREX house-holds associated withAREX/Census 2000 household-to-household comparisons,including such properties as thepresence of a person in thehousehold of a particular race orethnicity, and the presence of aperson with a characteristic thatwas imputed in AREX.

Prediction model

To learn more about the character-istics of administrative recordshouseholds that match the censusand to take a first look at thepotential uses of administrativerecords data to substitute for somepart of the nonresponse followupor unclassified households in aconventional census, a logistic

regression model was developedwith the AREX/Census 2000 linkedhouseholds as the units of analy-sis. The functional form of themodel is Logit (Match=1|x) = xβ

where Match is a dichotomousdependent variable, x is a vectorof regressors, and β is a vector ofconstants to be estimated. Foreach linked address, the depend-ent variable was defined as fol-lows:

This measure was based on thedistribution of personal character-istics within an address and not onmatches of individual persons. Anaddress in AREX and in the censusthat had exactly the same distribu-tional characteristics but werecomposed of entirely different per-sons would still receive a matchscore of 1. The simpler dependentvariable–1 if all persons were thesame, 0 otherwise–was not usedbecause the AREX operations didnot provide for matching individu-als from AREX and census enumer-ations. Considering that the agedistribution is in 5-year groups, thematch definition used wouldappear to provide a result veryclose to an exact person match.

The regressors include characteris-tics of AREX households and char-acteristics of the linked addresses,representing the kind of informa-tion that would be available weredata from administrative records tobe used in support of a conven-tional census.

Limitations

The principal limitations on theability to link addresses and demo-graphically match householdsstemmed from the same deficien-cies of the AREX administrative

records files discussed in previoussections. First, the administrativedata extracts were taken a year ormore before census day. Thismeans that movers, births, deaths,immigration and emigration, newhousing, abandoned and demol-ished housing were unaccountedfor a period of 12 or more monthsprior to census day. Second, manychildren are unaccounted for inadministrative records at thenational level; and therefore, AREX2000 had difficulty enumeratingchildren, generally, and, specifical-ly, by virtue of the time lag prob-lem and the limited demographicsavailable for children on theNumident file (Miller, Judson, andSater, 2000). Third, the race meas-urement and reporting deficienciesof the administrative records anddifferences in race measurementbetween AREX and the census pre-sented serious challenges to com-parisons matching race andHispanic origin between membersof AREX and census households.Finally, virtually all persons identi-fied as having Hispanic origin inthe AREX were imputed as suchthus weakening the comparisons.

The AREX FAV had little impact onthe household-level analysis; and a100 percent FAV, if actually carriedout, would have had little impactas well. Persons at administrativerecords addresses that would havebeen completely lost to the AREXas a result of the FAV would havehad no impact on the household-level analysis since none of theiraddresses match the DMAF. And,as discussed in Section 2, therewere would have been very fewpersons who remained in the enu-meration but at different addressesas a result of the FAV.

Finally, deficiencies in administra-tive records and HDF addresses(for example, address duplication)and address matching technology

44 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Match ={10

if the fully crossed age x race x sex x Hispanic origin distributions in the linked Census household match the AREX household; otherwise.

Page 55: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

resulted in a number of cases inwhich more than one administra-tive record address matched thesame HDF address and vice versa.All of the administrative recordsaddresses that matched the HDFbut not on a one-to-one basis wereexcluded from the analyses.

Descriptive analysis

AREX and Census address linkage

In the five counties covered by theexperiment, the Census 2000 HDFcontained 1,092,460 housing units(HUs) and 1744 group quarters(GQs), the latter excluded from thisanalysis. 24,584 (2.3 percent) ofcensus households were “imputedhouseholds,” and 360,914 (33.0percent) were in the Census 2000NRFU universe.

Of the 1,065,031 AREX addresses992,865 were linked with address-es that existed in Census HDF; but103,227 of the AREX addresses didnot have a one-to-one link andwere also excluded. This left889,638 linked AREX addressesavailable for the household-levelanalysis. They represented 81.4percent of census addresses.

Table 18 provides data on overalladdress linkage. AREX housingunits (i.e. addresses) were linkedwith 84.0 percent of the 1,017,273occupied census housing units.AREX housing units were linkedwith 46.4 percent of the 75,187vacant census housing units.About 88 percent of AREX vacantunits were found to be occupiedby the census.10 This confirms thediscussion in Section 2 in which itwas suggested that the AREXvacant addresses should have beencanvassed as part of the Bottom-upprocess.

AREX’s coverage of the CensusNRFU universe was not as good asits coverage of the non-NRFU uni-verse. AREX housing units werelinked with 70.9 percent of the360,914 Census NRFU housingunits, compared with 88.4 percentof the Census non-NRFU housingunits. For occupied NRFU housingunits, the coverage rate goes up to76.7 percent. Table 19 containsmore details about AREX’s coverageof Census NRFU and non-NRFUhousing units.

There were 24,584 imputed censushousing units in the AREX testsites. AREX housing units werelinked with 62.3 percent of them.AREX addresses were linked with63.2 percent of those that wereimputed to have people in them,and 34.7 percent of those imputed

to be vacant. The linkage ofimputed occupied units was abouttwice that of imputed vacant units,providing face validity for theCensus 2000 imputation.

The coverage by AREX of NRFUhousing units and imputed hous-ing units is not as good as for non-NRFU and non-imputed housingunits. This may be due to severalfactors: (1) components of addresses from NRFU and/orimputed housing units might begenerally of lower quality, and thusharder to match; (2) addresses ofthese housing units may be oftypes that are harder to match,e.g., those in apartment buildings,those on Rural Routes, or at P.O.boxes; and (3) people in thesehousing units may be more likelynot to show up on any of the

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 45

Table 18.Coverage by AREX of Census Housing Units

Total

Linked withAREX

housingunits

(percentof total)

Linked withAREX

occupiedhousing

units(percentof total)

Linked withAREX vacanthousing units

(percentof total)

Census housing units . . . . . . . . . . 1,092,460 889,638 813,688 75,950Percent . . . . . . . . . . . . . . . . . . . . . . (81.4) (74.5) (7.0)Occupied Census housing units . 1,017,273 854,741 787,802 66,939Percent . . . . . . . . . . . . . . . . . . . . . . (84.0) (77.4) (6.6)Vacant Census housing units . . . 75,187 34,897 25,886 9,011Percent . . . . . . . . . . . . . . . . . . . . . . (46.4) (34.4) (12.0)

Table 19.Coverage by AREX of Census Housing Units, by NRFUStatus

Type of Censushousing unit

Total

Linked withAREX

housingunits

Linked withAREX

occupiedhousing

units

Linked WithAREX

VacantHousing

Units

NRFU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360,914 70.9 60.8 10.1Non-NRFU . . . . . . . . . . . . . . . . . . . . . . . . . 716,450 88.4 82.9 5.5Occupied NRFU. . . . . . . . . . . . . . . . . . . . . 289,224 76.7 67.1 9.6Occupied non-NRFU. . . . . . . . . . . . . . . . . 715,115 88.5 83.0 5.5Vacant NRFU . . . . . . . . . . . . . . . . . . . . . . . 71690 47.6 35.2 12.3Vacant non-NRFU . . . . . . . . . . . . . . . . . . . 1,335 58.7 46.3 12.4

Excludes 15,096 housing units in Census HDF with unknown NRFU status.

10 Recall that AREX vacant housing unitsare those with an address that was linked tothe HDF but for which no persons remainedafter best address selection.

Page 56: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

administrative records used forAREX.

AREX and Census household size

All occupied households

Table 21 shows the distributions ofhousehold size for linked and non-linked occupied households inAREX and for Census. The AREXdistribution of household size wasquite similar to the census distri-bution.

One salient feature of the data wasthat among the unlinked housingunits in both Census and AREX, avery high percentage had one per-son. One possible explanation ofthis fact is that a much higher per-centage of one-person householdswere at basic street addresses at

which there are multiple housingunits, and addresses at such basicstreet addresses (BSAs) were hard-er to link.

Linked occupied and non-occupiedhouseholds

AREX and Census 2000 countedthe same number of people in thehousing unit for 51.1 percent ofthe 889,638 linked households,and AREX was within one of thecensus for 79.4 percent of theunits. The 51.1 percent is effec-tively a ceiling on the percent oflinked households that had exactlythe same persons from AREX andCensus 2000. Although errors inaddress linkage would account forsome of the mismatched house-holds, the deficiencies in adminis-

trative records cited earlier in thisreport–missing children, lack ofspecial population operations andthe time gap between the adminis-trative records extracts and censusday–most likely account for themajor part.

For linked NRFU housing units,AREX had the same numbers ofpersons for 37.0 percent of theunits and was within one 69.3 per-cent of the time. Evidently, Census2000 NRFU housing units are moresusceptible to AREX deficienciesthan non-NRFU units. In addition,enumeration errors (such as “curb-stoning”) in Census 2000 may behigher for these units than forunits that responded to the initialmailout.

For the 15,043 linked imputedoccupied households, AREX hadthe same count for 31.8 percent,and was within one for 66.8 per-cent of these addresses. The lowpercentage of household-by-house-hold agreement between AREX andthe census for imputed householdsshould be expected from the errorintroduced by the imputation.

Demographic comparisons of occu-pied linked households of the samesize

In this section, demographic char-acteristics of linked households arecompared. Because comparisonswithin households of differentsizes are difficult to interpret, onlylinked occupied housing units inwhich AREX and Census 2000 havethe same number of people areconsidered. There are 454,437 ofthese housing units representing42.6 percent of all census housingunits, 42.7 percent of all AREXhousing units, and 51.2 percent ofall linked housing units.

Tables 23-25 contain data only forlinked households for which AREXand the census had the same total

46 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Table 20.Coverage by AREX of Census Housing Units, by ImputationStatus

Type of Censushousing unit

Total

Linked withAREX

housingunits

Linked withoccupied

AREXhousing

units

Linked withvacant AREXhousing units

Imputed . . . . . . . . . . . . . . . . . . . . . . 24,584 62.3 51.7 10.5Non-imputed. . . . . . . . . . . . . . . . . . 1,067,876 81.9 75.0 6.9Imputed occupied . . . . . . . . . . . . . 23,811 63.2 52.6 10.6Non-imputed, occupied. . . . . . . . . 993,462 84.5 78.0 6.5Imputed vacant . . . . . . . . . . . . . . . 773 34.7 25.5 9.2Non-imputed, vacant. . . . . . . . . . . 74,414 46.5 34.5 12.0

Table 21.Distributions of Household Size for Census and AREX forall Five AREX Counties(Occupied housing units only)

Household SizeCensus AREX

Total Percent1 Total Percent2

1 . . . . . . . . . . . . . . . . . . . . . . 276,590 27.2 246,726 27.92 . . . . . . . . . . . . . . . . . . . . . . 331,472 32.6 262,075 29.63 . . . . . . . . . . . . . . . . . . . . . . 171,136 16.8 155,929 17.64 . . . . . . . . . . . . . . . . . . . . . . 142,822 14.0 127,295 14.45 . . . . . . . . . . . . . . . . . . . . . . 60,988 6.0 56,596 6.46 . . . . . . . . . . . . . . . . . . . . . . 21,655 2.1 22,695 2.67-9 . . . . . . . . . . . . . . . . . . . . 11,275 1.1 12,481 1.410+ . . . . . . . . . . . . . . . . . . . . 1,335 0.1 1,625 0.2All Sizes . . . . . . . . . . . . . . . 1,017,273 100 885,422 100

1Percent of all Census occupied housing units2Percent of all AREX occupied housing units.

Page 57: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

count. The tables show the fre-quencies with which AREX and thecensus agree for each:

Sex category;

Race category: White, Black,American Indian/Alaskan Native,Asian/Pacific Islander;

Hispanic origin category, i.e.Hispanic/non-Hispanic;

Five-year age category: 0-4, 5-9,…, 80-84, 85 and up;

Of the age categories: 0-17, 18-64,and 65 and up.

As expected, the agreements forracial composition and Hispanicorigin composition were good – ingeneral, well above 90 percent.Generally household members tendto be all of one race and Hispanicorigin. Also as expected, agree-ment rates did decline with house-hold size because the likelihood ofmissing or AREX imputed race anddifferent Hispanic origin imputa-

tions increases with number ofpersons in the household.

Agreement between AREX and thecensus across 5-year age groupsprovides an estimate of the propor-tion of households with exactly thesame persons because it is improb-able that two different householdswould agree in age distributions in5-year categories. About 81 per-cent of the 445,426 householdshad the same 5-year category

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 47

Table 22.Comparison of Census and AREX Household Size, by NRFU status, and by ImputationStatus(For linked housing units)

AREX person countcompared with Census All Census

housing units

Censusnon-NRFU

housing unitsCensus NRFUhousing Units

Non-imputedCensus

housing units

Imputedvacant

Censushousing units

Imputedoccupied

Censushousing units

Same count . . . . . . . . . . . . . . . . . . 454,437 359818 94619 449,582 71 4,784Percent . . . . . . . . . . . . . . . . . . . . . . (51.1)* (56.8) (37.0) (51.4) (26.5) (31.8)AREX one higher than Census. . 124,706 84269 40437 122,519 95 2,092Percent . . . . . . . . . . . . . . . . . . . . . . (14.0) (13.3) (15.8) (14.0) (35.5) (13.9)AREX one lower . . . . . . . . . . . . . . 127,531 85178 42353 124,355) - 3,176Percent . . . . . . . . . . . . . . . . . . . . . . (14.3) (13.4) (16.5) (14.2) (21.1)AREX 2 or 3 higher. . . . . . . . . . . . 64,635 36769 27866 63,024 77 1,534Percent . . . . . . . . . . . . . . . . . . . . . . (7.3) (5.8) (10.9) (7.2) (28.7) (10.2)AREX 2 or 3 lower . . . . . . . . . . . . 79,848 47938 31910 77,463 - 2,385Percent . . . . . . . . . . . . . . . . . . . . . . (9.0) (7.6) (12.5) (8.9) (15.9)AREX 4 or more higher . . . . . . . . 15,781 6486 9295 15,316 25 440Percent . . . . . . . . . . . . . . . . . . . . . . (1.8) (1.0) (3.6) (1.8) (9.3) (2.9)AREX 4 or more lower . . . . . . . . . 22,700 13158 9542 22,068 - 632Percent . . . . . . . . . . . . . . . . . . . . . . (2.6) (2.1) (3.7) (2.5) (4.2)TOTAL . . . . . . . . . . . . . . . . . . . . . . . 889,638 633,616 256,022 874,327 268 15,043Percent . . . . . . . . . . . . . . . . . . . . . . (100) (100) (100) (100) (100) (100)

* Percents are percents of column total.

Table 23.Comparisons Between AREX and Census for Demographic Groups, for Linked Households(HH) With the Same Number of People Only

HH Size Total linked, ofequal size

Equal for allsex groups1

Equal for allrace groups

Equal for allHispanic

groups

Equal for all5-year age

groups

Equal for agegroups 0-17,

18-64, 65+

Equal for alldemographic

groups3

All sizes . . . . . . . . . . 445,426 291.2 93.4 94.8 81.3 93.1 80.51 . . . . . . . . . . . . . . . . 139,292 92.2 95.1 97.5 82.5 96.1 85.42 . . . . . . . . . . . . . . . . 158,259 93.8 94.8 95.9 83.9 94.0 84.33 . . . . . . . . . . . . . . . . 60,641 87.1 90.7 92.3 75.7 88.4 72.24 . . . . . . . . . . . . . . . . 60,181 89.3 90.7 90.7 80.8 91.7 74.05 . . . . . . . . . . . . . . . . 20,723 86.8 88.9 89.3 77.2 89.0 69.56 . . . . . . . . . . . . . . . . 5,359 80.4 86.0 86.0 68.0 81.8 59.27+ . . . . . . . . . . . . . . . 971 56.8 80.8 83.0 28.7 52.7 28.7

1I.e., the AREX and Census households have the same number of males and the same number of females.2Percents are percents of the total column.3Both sex groups, all race groups, both Hispanic origin groups, and age groups 0-17, 18-64, 65+.

Page 58: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

distribution. This is about 41 per-cent of all linked households.

The agreement rate for linkedhouseholds of the same size issubstantially higher for the agegroup distribution with only threecategories, 0-17, 18-64 and 65 andup due to the increased tolerancefor reporting errors and the greaterprobability of chance agreement.

Table 24 shows that there was lessAREX to Census 2000 agreement

for NRFU households than forother census households, overalland controlling for size. Based onthe 5-year age group match forCensus NRFU households, onlyabout 19 percent of AREX house-holds linked with Census 2000NRFU households seemed to haveexactly the same persons. Asexpected there is even less agree-ment in household characteristicsbetween AREX and Census imputedhouseholds (Table 25).

Factors associated with demo-graphic match rates

Single- and multi-unit BSAs

Table 26 contains data regardingcomparisons of coverage rates,household size, and demographiccharacteristics for single- andmulti-unit BSAs.

For all census household sizes,AREX addresses were less likely tolink with census multi-unit

48 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Table 24.Comparison of AREX and Census Demographic Composition of Households(For linked households with the same number of people only, by size)

HH SizeTotal

Equal for allsex groups1,2

Equal for allrace groups

Equal for allHispanic

groups

Equal for all5-year age

groups

Equal for agegroups 0-17,

18-64, 65+

Equal for alldemo-graphic

groups3

All . . . . . . . . . NRFU 85,774 81.0 87.7 92.3 58.1 84.9 63.4non-NRFU 359,652 93.7 94.7 95.3 86.9 95.0 84.6

1 . . . . . . . . . . NRFU 31,313 82.5 89.3 95.7 57.5 91.1 68.9non-NRFU 107,979 95.0 96.8 98.1 89.7 97.5 90.2

2 . . . . . . . . . . NRFU 24,499 83.7 88.5 92.7 58.6 83.6 64.9non-NRFU 133,760 95.7 96.0 96.5 88.6 95.9 87.9

3 . . . . . . . . . . NRFU 12,549 75.7 85.6 89.4 54.3 77.1 54.8non-NRFU 48,092 90.1 92.1 93.0 81.4 91.4 76.8

4 . . . . . . . . . . NRFU 11,423 79.8 86.3 88.4 63.2 83.3 60.2non-NRFU 48,758 91.5 91.7 91.2 84.9 93.7 77.3

5 . . . . . . . . . . NRFU 4,473 78.1 84.9 87.2 60.4 80.0 56.8on-NRFU 16,250 89.2 90.1 89.9 81.8 91.4 73.0

6 . . . . . . . . . . NRFU 1,269 71.0 80.4 83.0 54.0 73.1 46.8non-NRFU 4,090 83.4 87.8 86.9 72.4 84.6 63.0

7+ . . . . . . . . . NRFU 248 53.6 79.8 81.2 27.0 47.6 24.6non-NRFU 723 58.0 81.2 80.0 29.3 54.5 30.2

1I.e., the AREX and Census households have the same number of males and the same number of females.2Percents are percents of total.3 Both sex groups, all race groups, both Hispanic origin groups, and age groups 0-17, 18-64, 65+.

Table 25.Comparison of AREX and Census Demographic Groups Within Households(For linked households with the same number of people only, by size)

HH Size

TotalEqual for allsex groups

Equal for allrace groups

Equal for allHispanic

groups

Equal for all5-year age

groups

Equalfor age

groups 0-17,18-64, 65+

Equal for alldemo-graphic

groups

All . . . . . . . . . . . . . Imputed 4,784 49.6 74.9 91.7 7.0 60.7 23.0Not imputed 440,642 91.7 93.6 94.8 82.1 93.4 81.2

Page 59: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

addresses than with single-unitaddresses. For linked householdsof equal size, AREX differed fromcensus in all demographic groupsmore often for households atmulti-unit addresses. The differ-ence in percentage of demographicagreement is about 12 percent forhouseholds of size 1 and in theneighborhood of 20 percent forhouseholds of sizes greater than 1.Deficiencies in administrativerecords coverage and timing of the extracts most likely explain the

differences in demographic agreement.

Age of household occupants

The discrepancies between AREXand the census were due partlybecause some households havemoved out of, and others movedinto, addresses between the timeof the administrative records cut-offs and the census. It is possiblethat households containing onlyolder people are less likely tomove, and may yield better AREX

to the census comparisons. Table27 provides address linkage ratesby whether the housing unit is atmulti-unit BSA, and by whether ithas only people 50 and over.Table 28 provides comparisons oflinkage rates, size, and demo-graphics for housing units contain-ing only people 50 and over, andothers. (Tables B.16 and B.17A-Bin Judson and Bauder (2002) con-tain similar comparisons for ages18 and over, and for 65 and over.)

The coverage by AREX of censushouseholds with everyone over 50was slightly, but consistently, high-er. This was true whether control-ling for multi-units or controllingfor size. The comparison forhousehold size and demographicswere much better for one and twoperson households with all mem-bers 50 and over. The demograph-ic comparison was worse forhouseholds of size 3 or more, but

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 49

Table 26.Comparison of Match Rates and Household Comparisons Between Occupied HousingUnits at Multi-Unit BSAs and Housing Units at Single-Unit BSAs

Census HH SizeGroup Total

Linked(percentof total)

Equal size(percent)1

Equal in alldemographic

groups (percent)2

All sizes3 . . . . . . . . . . . . . . . In multi-unit 278,447 188,826 88,517 64,992(67.8) (46.9) (73.4)

In single-unit 738,826 665,915 356,909 293,720(90.1) (53.6) (82.3)

1 . . . . . . . . . . . . . . . . . . . . . . In multi-unit 135,833 91,051 57,218 44,978(67.0) (62.8) (78.6)

In single-unit 140,757 125,568 82,074 74,034(89.2) (65.4) (90.2)

2 . . . . . . . . . . . . . . . . . . . . . . In multi-unit 80,719 55,820 21,788 15,009(69.2) (39.0) (69.3)

In single-unit 250,753 226,676 136,471 118,386(90.4) (60.2) (86.7)

3-4 . . . . . . . . . . . . . . . . . . . . In multi-unit 51,244 35,165 8,567 4,459(68.6) (24.4) (52.0)

In single-unit 237,644 237,644 112,255 83,906(90.5) (47.2) (74.7)

5-6 . . . . . . . . . . . . . . . . . . . . In multi-unit 9,390 6,063 926 456(64.6) (15.3) (49.2)

In single-unit 73,253 65,838 25,156 17115(89.9) (38.2) (68.0)

7+ . . . . . . . . . . . . . . . . . . . . . In multi-unit 1,261 727 18 0(57.7) (2.5)

In single-unit 11,349 10,189 953 279(89.8) (9.4) (29.3)

1Percent of linked.2Percent of linked of equal size.

Table 27.Coverage by Multi vs. Single Unit, and by Household AgeCharacteristics

Type of housing unit Census householdage characteristic Total Percent linked

All HUs . . . . . . . . . . . . . . . . All 50 or older 292,091 85.8Some under 50 639,088 79.9

In multi-unit . . . . . . . . . . . . . All 50 or older 81,480 69.8Some under 50 230,883 62.5

In single-unit . . . . . . . . . . . . All 50 or older 210,661 91.8Some under 50 569,486 86.9

Page 60: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

there were few of those where allmembers were 50 and over.

Race and Hispanic origin of house-hold occupants

Table 29 shows how coverage, sizecomparisons, and race compar-isons, vary with whether there wasa person with race other thanWhite in the household accordingto Census 2000.

For census households with atleast one person other than White,the coverage by AREX is smaller,but not smaller by much, com-pared with households all ofwhose members were White. Onthe other hand, the household sizecomparisons and the racial compo-sition comparisons display moredisagreement for households withat least one person other thanWhite. To some extent, this maybe a consequence of race imputa-tion that would have affected com-parison of households with one ormore persons other than Whitemore often than all White house-holds.

AREX coverage of census address-es did not differ much betweenhouseholds with and without

Hispanics (Table 30). However,households with one or moreHispanics in the census were muchless likely to match correspondingAREX households in size andHispanic/non-Hispanic composi-tion. Differences in householdsizes were most likely due to defi-ciencies in administrative recordscoverage of Hispanics and the ageof the records vis-à-vis the Census.Differences in Hispanic composi-tion within households of equalsize were most likely due to thefact that Hispanic origin wasmodel-based for virtually all AREXpersons.

AREX race imputation

Table 31 concerns linked house-holds in which no person’s AREXrace was imputed, and those inwhich at least one person’s racewas imputed. The comparison wasdone with regard to the racial com-position of the household. Asexpected, households with imput-ed race were less likely to agree onhousehold race composition.Although the overall agreementrate of 86 percent was quite highwhen one or more members hadimputed race, the agreement rate

may have been much smaller whena member was imputed to be of another race.

Predicting AREX/Census householdsimilarity

The purpose of the regressionanalysis was to try to understandmore about those circumstancesunder which AREX administrativerecords households would matchcensus households in both numberand demographic composition.This would also provide a first lookat the potential uses of administra-tive records data to substitute forsome part of the nonresponse fol-lowup or unclassified householdsin a conventional census.

Model specification

For this initial model-buildingattempt, the units of analysis wereall 889,638 one-to-one linkedhouseholds. Separate equationsfor Census 2000 NRFU and unclas-sified households were not esti-mated, but dummy variables wereincluded in the equation for thesetwo types of decennial censushousehold outcomes to seewhether other predictor variableshad accounted for differences in

50 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Table 28.AREX to Census Comparisons by Size of Housing Unit and by Household AgeCharacteristics

Size of HH Census householdage characteristic Total

Linked with AREXhousing units

(percent of total)Equal size(percent)1

Equal in alldemographic

groups2

(percent)3

1 . . . . . . . . . . . . . . . . All 50 or over 148,335 121,781 86,518 78,500(82.1) (71.04) (90.7)

Some under 50 128,235 94,838 52,774 40,512(74.0) (55.7) (76.8)

2 . . . . . . . . . . . . . . . . All 50 or over 137,758 123,412 83,662 76,685(89.6) (67.8) (91.7)

Some under 50 193,714 159,084 74,597 56,800(82.1) (46.9) (76.1)

3+ . . . . . . . . . . . . . . . All 50 or over 5878 5,357 2542 2072(91.1) (47.5) (81.5)

Some under 50 403,233 350,269 145,313 136,147(86.9) (41.5) (93.7)

1Percent of linked households.2Equal in: both sex groups, allrace groups, both Hispanic origin categories, and age groups 0-17, 18-64, 65+.3Percent of linked of equal size.

Page 61: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 51

Table 29.The Effect of the Presence of Persons Other Than White in a Household on HouseholdMatch Rates and Comparisons

Census HH SizeHousehold type Total

Linked with AREXhousing units

(percent of total)Equal size(percent)1

Equal in all four racegroups (percent)2

All sizes . . . . . . . . . . . . . . . . All White 740,218 631,606 358,833 347,592(85.3) (56.8) (96.9)

At least oneOther race 278,799 223,135 86,593 68,356

(80.0) (38.8) (78.9)

1 . . . . . . . . . . . . . . . . . . . . . . All White 205,226 165,098 111,112 108,450(80.5) (67.3) (97.6)

At least oneOther race 71,498 51,121 28,180 24,049

(72.1) (54.7) (85.3)

2 . . . . . . . . . . . . . . . . . . . . . . All White 256,585 221,806 133,180 130,033(86.5) (60.0) (97.6)

At least oneOther race 75,038 60,690 25,079 19,995

(80.9) (41.3) (79.7)

3-4 . . . . . . . . . . . . . . . . . . . . All White 219,030 192,772 93,694 89,491(88.0) (48.6) (95.5)

At least oneOther race 95,207 80,037 27,128 20,105

(84.1) (33.9) (74.1)

5-6 . . . . . . . . . . . . . . . . . . . . All White 53,202 46,707 20,319 19,144(87.8) (43.5) (94.2)

At least oneOther race 29,635 25,194 5,763 3,896

(85.0) (22.9) (67.6)

7+ . . . . . . . . . . . . . . . . . . . . . All White 6,175 5,223 528 474(84.6) (10.1) (89.8)

At least oneOther race 7,421 5,693 443 311

(76.7) (7.8) (70.2)

1Percent of linked households.2Percent of linked households of equal size.

Page 62: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

52 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Table 30.The Effect of the Presence of Hispanics on Household Match Rates

Census HH sizeHousehold type Total

Linked with AREXhousing units

(percent of total) Equal size (percent)1Equal number of

Hispanics (percent)2

All sizes . . . . . . . . . . . . . . . . All non-Hispanic 956,474 803,272 424,867 411,698(84.0) (96.9) (52.9)

At least oneHispanic 62,533 51,469 20,559 10,365

(82.3) (39.9) (50.4)

1 . . . . . . . . . . . . . . . . . . . . . . All non-Hispanic 269,018 210,745 136,114 134,063(78.3) (64.6) (98.5)

At least oneHispanic 7,706 5,874 3,178 1,802

(76.2) (54.1) (56.7)

2 . . . . . . . . . . . . . . . . . . . . . . All non-Hispanic 314,587 268,371 151,588 147,697(85.3) (56.5) (97.4)

At least oneHispanic 17,036 14,125 6,671 4,053

(82.9) (47.2) (60.8)

3-4 . . . . . . . . . . . . . . . . . . . . All non-Hispanic 287,966 250,589 112,467 106,922(87.0) (44.9) (95.1)

At least oneHispanic 26,271 22,220 8,355 3,609

(84.6) (37.6) (43.2)

5-6 . . . . . . . . . . . . . . . . . . . . All non-Hispanic 73,654 64,212 23,831 22,235(87.2) (37.1) (93.3)

At least oneHispanic 9,183 7,689 2,251 876

(83.7) (29.3) (38.9)

7+ . . . . . . . . . . . . . . . . . . . . . All non-Hispanic 11,249 9,355 867 781(83.2) (9.3) (90.1)

At least oneHispanic 2,347 1,561 104 25

(66.5) (6.7) (24.0)

1Percent of linked.2Percent of linked of equal size.

Table 31.The Effect of AREX Imputed Race on Household Comparisons

Census household size Total linked,with equal size

[1]

Households with at least one personwith AREX imputed race

Households with no personwith AREX imputed race

Number(percent of [1])

[2]

Equal in allrace categories(percent of [2])

Number(percent of [1])

[3]

Equal in allrace categories(percent of [3])

All sizes* . . . . . . . . . . . . . . . 445,426 100,416 86,290 345,010 329,658(22.5) (85.9) (77.5) (95.6)

1 . . . . . . . . . . . . . . . . . . . . . . 139,292 5,197 4,099 134,095 128,400(3.7) (78.9) (96.3) (95.8)

2 . . . . . . . . . . . . . . . . . . . . . . 158,259 14,087 11,351 144,172 138,677(8.9) (80.6) (91.1) (96.2)

3-4 . . . . . . . . . . . . . . . . . . . . 120,822 61,389 53,689 59,433 55,907(50.8) (87.5) (49.2) (94.1)

5+ . . . . . . . . . . . . . . . . . . . . . 27,053 19,743 17,151 7,310 6,674(73.0) (86.9) (27.0) (91.3)

5-6 . . . . . . . . . . . . . . . . . . . . 26,082 18,991 16,558 7,091 6,482(72.8) (87.2) (27.2) (91.4)

7+ . . . . . . . . . . . . . . . . . . . . . 971 752 593 291 192(77.4) (78.9) (22.6) (87.7)

* Not including zero.

Page 63: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

AREX/Census 2000 household sim-ilarity noticed in the descriptiveanalyses. This approach assumedthat the regression hyperplanes forthe three types of census house-holds were parallel, an assumptionthat will be tested in future analy-ses.

Also, the analysis reported hereattempted to account for house-hold size agreement and demo-graphic composition simultaneous-ly, providing in some sense, thenet association between predictorsand outcomes. However, it is pos-sible that associations betweenpredictors and household sizeagreement are different from asso-ciations with demographic similari-ty given agreement in size. Thispossibility will also be explored infuture work.

The functional form of the modelis

where Match is a dichotomousdependent variable, x is a vectorof regressors, and ? is a vector ofconstants to be estimated. Foreach linked address, the depend-ent variable was defined as fol-lows:

Predictor variables

The regressors, x, include charac-teristics of AREX addresses andhouseholds that would be availablewere data from administrativerecords to be used in support of aconventional census. The variableis dichotomous, taking on thevalue 1 if the characteristic is pres-ent and 0, otherwise. The interac-tion terms are products of the

individual predictors. The predic-tors are numbered with the “Rownum” in Table 34.

The predictors were chosen afterexamination of an extensive set ofbivariate crosstabulations with thedependent variable. The tabula-tions and associated discussioncan be found in Judson and Bauder(2002).

Address administrative recordssource files

Generally, it was assumed thataddresses appearing in more thanone administrative record sourcefile would be less likely to repre-sent a moving household thanaddresses found in only one file.Additionally households withaddresses in Medicare files wouldlargely represent older persons andrepresent stable households. Thefollowing variables pertain to thesource of the “best” administrativerecords address.

[10] In IRS file – In the IRS 1040file.

[11] In IRMF file – In the IRSInformation Returns MasterFile (i.e. the 1099 file).

[12] In Medicare file – In theMedicare eligibility file

[13] In IRS & IRMF –[10]*[11]

[14] In IRS & MED – [10]*[12]

[15] In MED and IRMF – [11]*[12]

AREX household characteristics

The following are characteristics ofAREX households thought to beassociated with same size anddemographic similarity with thelinked census households. To acertain extent, these variable aresuggested by the descriptive analy-sis of the previous section, keep-ing in mind that the descriptiveanalysis often used census house-

hold characteristics rather thanAREX household characteristics.

[5] One or two persons –Household contains only 1 or 2persons.

[6] No imputed race – No house-hold member has imputedrace.

[7] Hhold has children –Household has one or morechildren under the age of 18.

[8] Hhold has 1+ White –Household has one or moreWhite members.

[9] Hhold all age 65+ – All mem-bers of the household are age65 or over.

[16] Age 65+ & One/two – [5]*[9]

[17] Age 65+ & 1+ White – [8]*[9]

[18] One/two & 1+ White –[5]*[8]

[19] 65+ & 1 or 2 & 1+ White –[5]*[8]*[9]

[21] 65+ and no imputed race –[6]*[9]

AREX/DMAF address characteristics

Single unit addresses are assumedto be more predictable than multi-unit addresses.

[4] Not multi unit – AREX indi-cates that the address is singleunit.

[20] 65+ and Not multiunit –[4]*[9]

[22] No imputed Race and notmulti – [4]*[6]

[23] 65+ & No imp. & not multi– [4]*[6]*[9]

[1] Colorado effect – AREXaddress is in Colorado

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 53

y = 1n P(Match = 1|x)

P(Match = 0|x)= Logit(P(Match=1|x)=xß

Match ={10

if the fully crossed age x race x sex x Hispanic origin distributions in the linked Census householmatch the AREX household; otherwise.

Page 64: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Census 2000 response type

Including variables representingCensus 2000 response type pro-vides a first indication of whetherseparate models might be neededfor each type. (The referencegroup is mailback respondents.)

[2] Enumerator return – NRFUrespondent household

[3] Imputed return – Census2000 whole house imputation

Regression results

Although the regression results areuseful in obtaining an initial under-standing the relationships betweenAREX address and household char-acteristics and AREX/Census Matchstatus, it is important to keep inmind that these matched house-holds are not a nationally represen-tative sample, that the analysis isexploratory in nature, and thatimprovements in administrativerecords processing in the futurewill be substantial. Therefore,these results should be consideredillustrative in nature.

About 38.5 percent of all linkedaddresses also matched on demo-graphics.

All of the variables taken togethersignificantly improved the predic-tion of Match status. The PseudoR-Square value indicates that themodel results in a 19 percentimprovement in the log-likelihoodover the null model of an interceptonly.

Coefficient estimates

Table 34 provides maximum likeli-hood estimates of the coefficientsfor the full model. The rightmostcolumn indicates exponentiatedcoefficients, and can be interpretedas the (multiplicative) change inthe odds of being a match giventhe corresponding characteristic,holding all other variables con-

stant. An exponentiated coeffi-cient of “1” indicates no effect,greater than 1 indicates positiveeffect, and less than 1 indicatesnegative effect.

The presence of interaction termsmakes the interpretation of individ-ual coefficients somewhat difficult.Still, the results for AREX addressand household characteristicsseem generally as expected. AREXhouseholds that are smaller (oneor two members), have only mem-bers aged 65+, have one or moreWhites, and have no members withimputed race tend to be more like-ly to match the correspondingCensus 2000 household. AREXhouseholds at single-unit address-es are more likely to match thecensus than those at multi-unitaddresses.

The negative coefficients on theEnumerator Return and ImputedReturn indicate that these house-

holds remain less predictable otherfactors held constant. Separateequations for Census NRFU house-holds might be required.Households where the censusreturn was imputed are veryunlikely to have the same demo-graphics as their AREX counter-parts and have added some noiseto the coefficient estimates.

The last four rows of the tableindicate the net effect of somecombinations of variables, calculat-ed by multiplying their exponenti-ated coefficients. For example, thetotal effect of being captured inIRS, IRMF, and Medicare however, iseffectively that the household isabout 1.6 times more likely todemographically match.

The effect of having all persons 65or older, at least one White person,and consisting only of a one ortwo person household (given thatthe household is multi-unit and has

54 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Table 32.Overall Response Profile for the Match VariableResponse Profile and Overall Model Fit Statistics

Match status Total frequency Percent

Demographics match . . . . . . . . . . 342,294 (38.5)Demographics do not match . . . . 547,344 (61.5)

Table 33.Goodness-of-Fit Measures for the Logistic RegressionModel

Criterion Intercept only Intercept and covariates

AIC . . . . . . . . . . . . . . . . . . . . . . . . . . 1,185,613.2 1,001,550.2SC . . . . . . . . . . . . . . . . . . . . . . . . . . 1,185,624.9 1,001,831.0–2 Log L . . . . . . . . . . . . . . . . . . . . . 1,185,611.2 1,001,502.2Pseudo R-square. . . . . . . . . . . . . . 0.1869

Test Chi-Square DF Pr>ChiSqLikelihood Ratio 184,108.945 23 <.0001

(full model versus null model of intercept only)

Note: N=889,638 households in two AREX test sites in Colorado and Maryland whose addresseswere computer linked; A household is declared ‘‘matched’’ if its age, race, sex and Hispanic origincomposition is the same across the AREX household and the equivalent census household. AIC is theAkaike Information Criterion; SC is the Schwarz criterion. -2 Log L is -2 times the log likelihood (LL) ofthe model, evaluated at its maximum; R-square is the pseudo R-square value, consisting of (LL(model) -LL(intercept only))/LL(model). The Likelihood Ratio test tests the null hypothesis that all coefficientsexcept the intercept are zero in the population; PrChiSq is the (nominal) probability of obtaining thatChi-Square value by chance; Because observations may not be I.I.D., standard errors may be under-stated and significance levels overstated. (Note is also applicable to Table 34).

Page 65: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

at least one member with imputedrace) is dramatically positive,14.92. Similarly, a household hav-ing all persons 65 or older, notbeing a multiunit address, and hav-ing no imputation from the admin-istrative records (but also otherthan white and more than two per-sons) is about four times morelikely to match census demograph-ics, holding other effects constant.

Goodness of Fit

One way to evaluate the ability ofthe model to correctly predict

household match status is toestablish a decision rule that firstchooses a probability level, c, andthen deems an observation to bedemographically matched if theprobability that Match = 1 for thatobservation, calculated from themodel, is greater than c. Moresuccinctly, for a given level ofc,0≤c≤1 if p[match=1|xβ]≥c predict“AREX household is demographical-ly matched.” Otherwise predict thatthe household is not demographi-cally matched. For a given proba-bility level, there are several meas-

ures that can be used to evaluatethe decision rule.

Accuracy: Proportion of all casescorrectly classified.

False positive: Proportion ofcases where the true match statusis 0 given that the prediction is 1.

False negative: Proportion ofcases where the true match statusis 1 given that the prediction is 0.

Sensitivity: Proportion of caseswhere the prediction is 1 giventhat the true match status is 1.

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 55

Table 34.Maximum Likelihood Parameter Estimates, Standard Errors, and Approximate Tests

Rownumber Variable df Estimate Standard error Wald chi-square PR ChiSq Exp (est)

[0] . . . . . . . Intercept 1 –2.756 0.050 2977.43 <.0001 0.064[1] . . . . . . . Colorado effect 1 –0.102 0.005 379.62 <.0001 0.903[2] . . . . . . . Enumerator return 1 –1.096 0.006 26648.72 <.0001 0.334[3] . . . . . . . Imputed return 1 –3.133 0.110 809.52 <.0001 0.044

[4] . . . . . . . Not multiunit 1 0.926 0.018 2656.05 <.0001 2.525[5] . . . . . . . One or two persons 1 0.982 0.011 7013.33 <.0001 2.672[6] . . . . . . . No imputed race 1 0.790 0.018 1778.60 <.0001 2.205[7] . . . . . . . Household has children 1 0.275 0.007 1239.27 <.0001 1.317[8] . . . . . . . Household has one +

White 1 0.598 0.009 4168.03 <.0001 1.819[9] . . . . . . . Household all age 65+ 1 0.281 0.187 2.25 0.1334 1.325

[10] . . . . . . In IRS file 1 –0.048 0.047 1.04 <0.3075 0.953[11] . . . . . . In IRMF file 1 –0.341 0.047 52.61 <.0001 0.710[12] . . . . . . In Mmdicare file 1 –0.076 0.048 2.50 <0.1136 0.927[13] . . . . . . In IRS & IRMF 1 0.901 0.047 363.32 <.0001 2.462[14] . . . . . . In IRS & medicare 1 –0.488 0.015 996.77 <.0001 0.614[15] . . . . . . In medicare and IRMF 1 0.390 0.047 68.23 <.0001 1.478

[16] . . . . . . Age 65+ & one/two 1 0.870 0.156 30.81 0.0001 2.389[17] . . . . . . Age 65+ & one + White 1 –1.042 0.167 38.63 <.0001 0.353[18] . . . . . . One/two & one + White 1 –0.036 0.013 8.001 <0.0047 1.037[19] . . . . . . 65+ & one or two & one

+ White 1 0.974 0.168 33.25 <.0001 2.649

[20] . . . . . . 65+ and not multi-unit 1 –1.021 0.119 73.41 <.0001 0.360[21] . . . . . . 65+ and no imputed

race 1 0.425 0.105 16.23 <.0001 1.531[22] . . . . . . No imputed race and not

multi 1 –0.630 0.019 1057.22 <.0001 0.532[23] . . . . . . 65+ & no imputed race

& not multi 1 0.657 0.120 29.90 <.0001 1.931

[10]*[11]*[13] . . . . . .

Total effect of capture inIRS and IRMF (given not in Medicare) 1.666

[10]**[15] . . . . .

Total effect of capture inall three files (w/o three-way interaction) 1.401

[5]*[8]*[9]*[16]*[19] . . . . . . Total effect of all of 65+, White, and 1/2 person household 14.92

[4]*[6]*[9]*[20]*[23] . . . . . . Total effect of all 65+, nonmulti-unit, nonimputed race 4.177

Page 66: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Specificity: Proportion of caseswhere the prediction is 0 giventhat the true match status is 0.

Table 35 shows the estimates ofthese quantities for decision ruleprobability levels between .5 and .9.

As can be seen, if we choose thecutoff of 0.5 (so that we predict a“match” when P[Match=1|xβ] isgreater than or equal to .5), weobtain about 184,000 correctmatch predictions, and about458,000 correct nonmatch predic-tions. Dividing the sum of the cor-rect predictions by the total num-ber of cases above the 0.5 cutoff(about 889,000) gives 72.2 percentcorrect predictions (accuracy).Similarly, 54 percent of the match-es were correctly predicted to bematches (sensitivity); 83.7 percentof the nonmatches were correctlypredicted to be nonmatches (speci-ficity); there was a 32.7 percentfalse positive rate and a 25.7 per-cent false negative rate.

As the cutoff level, c, increases(i.e., becomes more stringent), theprobability of making an error indeeming a match status of 1 abovethe cutoff (probability of a falsepositive) declines. For example atc=0.8, the number of correct pre-dictions is 32,373 and the numberof incorrect predictions is 6,546for a total of 38,919. Thus theprobability of a false positive is6,546/38,919 = 0.168. As shownin the table, as c increases, thesensitivity and overall correctness

decline, and specificity and theprobability of a false negativeincrease.

In order to evaluate cutoffs andtheir implications for goodness offit, sensitivity and specificity, wepresent the following evaluativefigures. Figure 18 provides anassessment of the goodness of fitof the obtained logit functionagainst “jittered” outcomes.

In this figure, the ordinate is thevalue of the logit function ln(p/1-p). A 10 percent sample of the889,638 observations are plottedhere. Each individual observation(a linked pair of addresses) is plot-ted as a point near zero or one.The points have been “jittered”slightly to simulate density andavoid overplotting. The abscissa isthe predicted probability that anobservation will be a match. If wechoose 0.5 as our cutoff (so thatwe declare an observation a pre-dicted match is P[match|XB]>0.5),then this corresponds to a logitvalue of zero, and the vertical line.The horizontal line at 0.5 is for ref-erence. Points in the upper righthand quadrant are “hits”–correctpredictions that the demographicsof the households match. Points inthe lower left hand quadrant arealso “hits”–correct predictions thatthe demographics of the house-holds will not match. Points in theupper left hand and lower righthand quadrants are misses–incor-rect predictions. Comparing thepredicted logit function to the

density of the obtained match out-comes assesses goodness of fit.(For more on the development andinterpretation of this graph, seeJudson, 1992. For more goodnessof fit measures and graphics, seeJudson and Bauder, 2002.)

In thinking about using a regres-sion model approach in decidingwhen to substitute an administra-tive records household for a nonre-sponse household in a convention-al census, the probability of falsematch would have to be small, pro-viding confidence that the house-hold substitution was accurate andobviating the need for further enu-meration. But the proportion ofhouseholds in scope for substitu-tion, that is, the proportion ofhouseholds above the decision cut-off level would have to be largeenough to provide substantial sav-ings over face-to-face enumera-tions. For example, from Table 35,a cutoff of 0.8 would provide a rel-atively low probability of false neg-ative, 0.168; but the proportion ofhouseholds in scope for substitu-tion at that cutoff would be about4 percent (38,919/889,638).

Summary and conclusions

General similarity between AREXdata and Census data

A summary of the AREX to Census2000 comparisons is given in Table 36.

The overall coverage of occupiedcensus housing units by AREX was

56 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Table 35.Classification Results for Predicted Probabilities

Probability levelCorrect Incorrect Percentages

Event Nonevent Nonevent Event Correct Sensitivity Specificity False POS False NEG

0.5 . . . . . . . . . . . . . . . . . . . . 184,230 457,943 89,401 158,064 72.2 53.8 83.7 32.7 25.70.6 . . . . . . . . . . . . . . . . . . . . 110,401 506,699 40,645 231,593 69.4 32.3 92.6 26.9 31.40.7 . . . . . . . . . . . . . . . . . . . . 72,335 530,307 17,037 269,959 67.7 21.1 96.9 19.1 33.70.8 . . . . . . . . . . . . . . . . . . . . 32,373 540,798 6,546 309,921 64.4 9.5 98.8 16.8 36.4

Page 67: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 57

Figure 18.Goodness of Fit Diagnostic Plot

Page 68: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

about 84 percent (81 percent ofoccupied and vacant units). Thecoverage of census addresses byadministrative records addressescould be raised substantially byresolving matches that were notone-to-one, by filling coveragegaps in administrative records, andby obtaining administrative recordsextracts at points in time closer tocensus day. Proposals for accom-plishing all of these tasks are pro-vided in Section 5 of this report.

Similarity in size and demographiccomposition of linked AREX andcensus households was rather low.Of the occupied census linkedhouseholds, AREX and the censushad the same number of people in52.1 percent of the cases (51.4percent of all linked households).In 41.9 percent of occupied linkedhouseholds, AREX and the censushad the same number of people,and the same demographic distri-

butions using the three broad agecategories.

About 81 percent of householdshad the same 5-year age distribu-tion and about 93 percent had thesame age distribution in the threebroad groups. This suggests thatthe proportion of households ofthe same size that had exactly thesame persons was somewhere inbetween. The relatively low per-centage of households (80.5 per-cent) with similarity along alldemographic dimensions was duein large part to race and Hispanicorigin imputation and the differ-ence in race categories betweenAREX and the census. It is unlikelythat even improved race andHispanic origin models will be suf-ficient, in themselves, for decenni-al census enumeration. Anotherapproach is currently being devel-oped. (See the discussion inSections 4 and 5.)

In summary, the key deficiency ofthe AREX administrative recordsprocessing was the failure to getthe right number of people (and,therefore, the right people) atmany of the addresses.Dissimilarity of households is ofspecial concern for an AREX-typeof design because of the limitedopportunities to correct that partof the enumeration obtained fromthe administrative records. Thiswill be the biggest challenge forfuture administrative recordsdevelopment.

Similarity between AREX andCensus NRFU and imputed house-holds

There was less similarity betweenAREX and Census NRFU householdsthan non-NRFU households acrossall outcome measures. Theaddress linkage rate for CensusNRFU households was about

58 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Table 36.Summary of Match Rates and Household Comparisons Between AREX and Census

Type of housing unit All of census NRFU Non-NRFUImputed

householdsNonimputedhouseholds

Total Occupied census housing units. . . . . . . . . . . . . . 1,017,273 289,224 728,049 23,811 993,462

Census occupied, linked. . . . . . . . . . . . . . . . . . . . . . . . . 854,741 221,909 632,832 15,043 839,698(84.0) (76.7) (86.9) (63.2) (84.5)

Linked occupied with equal number. . . . . . . . . . . . . . . 455,426 85,774 359,652 4,784 440,642(52.1) (38.7) (56.8) (31.8) (52.5)

AREX and census counts both sex categories . . . . . 406,349 69,488 336,861 2,373 403,976(91.2) (81.0) (93.7) (49.6) (91.7)

AREX and census counts equal in all racecategories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415,948 75,262 340,686 3,583 412,365

(93.4) (87.7) (94.) (74.9) (93.6)AREX and census counts equal in both Hispanicorigin categories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422,063 79,146 342,917 4,388 417,675

(94.8) (92.3) (95.4) (91.7) (94.8)AREX and census counts in qll 5-year agecategories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362,202 49,833 312,369 335 361,867

(81.3) (58.1) (86.9) (7.0) (82.1)

Equal in age groups 0-17, 18-64, 65+ . . . . . . . . . . . . . 414,668 72,835 341,833 2,905 411,763(93.1) (84.9) (95.1) (60.7) (93.5)

AREX and census counts equal in sex, race,Hispanic origin, and 5-year age groups . . . . . . . . . . . 333,577 43,210 290,367 138 333,439

(74.9) (50.4) (80.7) (2.9) (75.7)AREX and census equal in demographiccomposition: sex, race, Hispanic origin, and agegroups 0-17, 18-64, 65+ . . . . . . . . . . . . . . . . . . . . . . . . 358,712 54,400 304,312 1,099 357,613

(80.5) (63.4) (84.6) (23.0) (81.2)

1Percent of census occupied housing units.2Percent of census linked housing units.3Percent of linked housing units with equal numbers of people.

Page 69: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

77 percent compared to 84 percentfor non-NRFU households. ForNRFU households, AREX and cen-sus agreement on the householdsize was 39 percent (57 percentfor non-NRFU), and agreement onall demographic groups givenagreement on size was 63 percent(85 percent for non-NRFU).

These results suggest that substi-tuting AREX households for NRFUhouseholds in a conventional cen-sus will be more difficult thanmatching households in general. Itmay be that households for whichadministrative records are weakoverlap disproportionately with theCensus NRFU population.However, it may also be the casethat the characteristics of CensusNRFU were more likely to be affect-ed by AREX source file cutoffs andother AREX processing decisions

than non-NRFU households. Alsothe Census 2000 enumeration ofNRFU households may be less reli-able.

Similarity between AREX adminis-trative records households andCensus 2000 unclassified house-holds was substantially weakerthan with the NRFU households.This is not surprising since thecensus was least sure of the statusof these addresses and the personsplaced at them were imputed bythe Census. AREX 2000 did notprovide the information needed toassess whether using administra-tive records to enumerate censusunclassified households would bemore accurate than the imputation.

Predicting household similarity

The logistic regression model pre-dicted modestly well when AREX

and census households matched

demographically. Factors that pre-

dicted demographic matches

included: one or two person

households, households with

exclusively older persons, house-

holds where members are captured

by more than one administrative

record system, households with no

race imputation, and households

that were at single-unit addresses.

A decision rule that deemed

matches if predicted probability

was above 50 percent resulted in

correct match status in 72 percent

of the cases but with a false posi-

tive rate of about 33 percent.

The most stringent cutoff of 80

percent reduced the false positive

rate to 16.8 percent, entailed only

about 4 percent of the household

population.

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 59

Page 70: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

This page intentionally left blank.

Page 71: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

4.1 AdditionalAdministrative RecordsExperiment 2000 evalua-tions

Assess the net impact of clericaland field processes on the bottom-up enumeration

There were originally four opera-tions in AREX 2000 that weredesigned to improve administrativerecords addresses. MAFGOR wasused to geocode city-style address-es that were not coded by comput-er. The RFPA was designed toobtain physical addresses for per-sons whose mailing addresseswere non-city style or P.O. Box andgeocode them. The clericalreviews following the initial matchto the MAF and the FAV weredesigned to validate administrativerecord addresses that did notmatch the MAF. All of these opera-tions were complex and laborintensive.

It is not possible to evaluate eachof the clerical and field operationsseparately because the AREXdesign did not vary these factorsexperimentally, and the RFPAresults were not included in theAREX. Still it may be possible togauge the net effect of the threeoperations on the Bottom-upresults by stepping through theBottom-up process excludingaddress information from any ofthe clerical or field operations andcomparing the results to theBottom-up enumeration.

The impact of eliminating the MAF-GOR, clerical review of the firstDMAF match, and the FAV would

be threefold. First, the effect ofeliminating the three operationswould be to decrease the numberof persons enumerated fromadministrative records. Those per-sons at addresses that did notcomputer geocode, did not matchthe MAF through computer opera-tions, or were found to be validonly by the FAV would be eliminat-ed if their addresses were all ofthese types. Second, selectedaddresses for some individualsenumerated from administrativerecords would change becausetheir current Bottom-up addresswould be eliminated leaving someother address still acceptable tothe Bottom-up process. Finally,because the total number ofacceptable administrative recordaddresses would be smaller, therewould be more non-matched DMAFrecords to canvass in order to com-plete the enumeration. That is, thenumber of addresses brought in bythe Census Pull, simulating thecanvassing, would be larger.

In broad terms, eliminating theresults of the three addressimprovement operations wouldrequire the following steps:

• Recreate the address lists avail-able to the Bottom-up by remov-ing all addresses that were inthe original list due to any ofthe three operations. (Becausethe new address list is a subsetof the original list, an additionalmatch to the DMAF for tabula-tion block codes should not berequired. This ignores the pos-sibility that a MAFGOR codedaddress might have picked up ablock code from the DMAF);

• Recreate the Bottom-up compos-ite person records by matchingthe smaller set of addresses tothe individuals and reapplyingthe address selection rules; and,

• Rematch to the Census 2000HDF in the AREX sites andinclude persons at unmatchedHDF addresses.

There are a number of ways thatthis alternative process could beevaluated. First, the alternativeBottom-up counts could be com-pared with the Census 2000 usingmethodology similar to that ofSection 3. The purpose would beto test if basic enumeration resultswere different from those of theoriginal Bottom-up process.

Matching the two sets of Bottom-up results person by person couldmake a more extensive analysis.This would permit an analysis of(1) persons lost completely to theBottom-up as a result of droppingthe address operations, (2) personswhose address changed within thesite, (3) persons omitted fromadministrative records counts whowere picked up at additionalCensus Pull addresses, and (4) newCensus Pull persons not in the firstAREX Bottom-up results. Ofcourse, a person-to-person matchwould require substantial work.The methodology used by Wagner(2002) in matching the Numidentto the Census 2000 HCUF may pro-vide a useful approach.

Finally, there could be analysesthat focus on the addresses ratherthan the persons. Address analy-ses could address the administra-

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 61

4. Discussion and Recommendations for2010 Planning and Other Census BureauPrograms

Page 72: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

tive records sources of rejectedaddresses, the impact of rejectedaddresses on address selection,and a comparison of addresses forpersons omitted from the adminis-trative records counts but whoshowed up in the Census Pull. Inthe latter, it might be important tounderstand why the Census Pulladdress was not obtained fromadministrative records addressselection.

Adding AREX vacants and undupli-cating bottom-up results

In addition to comparing the twoBottom-up processes describedabove, consideration should begiven to creating two additionalBottom-up enumerations based onproposals offered in the Bottom-upevaluation in Section 3: (1) addingthe vacant AREX addresses to theCensus Pull if they matched theHDF, and (2) unduplicating individ-uals between the Census Pull andthe administrative records. Thispair of Bottom-up enumerationswould appear to be more correctthan the corresponding pair with-out these additional operations.

Repeating AREX 2000 with StARS2000 without clerical or field operations

If eliminating the clerical addressoperations, as discussed in Section2, turns out to be relatively incon-sequential, then there would begreat value in “repeating” the AREXBottom-up process (without theclerical address operations) andthe statistical evaluations withStARS 2000. The reason is that theadministrative records data setsused in StARS 2000 are much clos-er to those that might be availablein an actual administrative recordscensus than those of StARS 1999(putting aside the possibility ofadditional administrative recordssources). Having the results fromthis administrative records baseline

will be important in planning forAREX 2010 because the test itemswill not be confounded with thetiming problems due to the admin-istrative records extract dates inStARS 1999.

Since the purpose of redoing theBottom-up would not be primarilyto compare with the StARS 1999results, operational improvements,such as improved SSN validationfor IRS files that have been incor-porated into StARS 2000 would beappropriate. The StARS 2000 AREXshould include DMAF matchedAREX vacant addresses in theCensus Pull, and there should beunduplication after it. Also, themore timely administrative recordswould provide an opportunity totake a new look at the addressselection algorithm, redirecting theemphasis to choosing the addressthat best reflects residence as ofCensus Day and away from thesomewhat artificial focus on thepresence of block codes.

Using the StARS 2000 files doesnot resolve two of the major limita-tions on the AREX: the handling ofspecial populations, and race andHispanic origin measurement. Forthe latter, consideration should begiven to using Wagner’s race andHispanic origin data (2002).Although this is somewhat circular,again, it might provide results thatare much closer to what wouldhave been achieved in an actualcensus. Correct handling of spe-cial populations may need to awaitfuture experiments.

Analysis of administrative recordscoverage gaps11

In this report, a number of cover-age gaps in administrative recordsfor both adults and children have

been identified; but the populationsizes and characteristics of themissing persons is not known pre-cisely. A linkage of StARS to aSurvey of Income and ProgramParticipation (SIPP) or CurrentPopulation Survey (CPS) sample fora corresponding time period wouldprovide some information aboutthe characteristics of persons inthe survey who were not found inthe administrative records. Thelinkage could be easily accom-plished using the SSNs developedfor those surveys, and using prob-abilistic methods for those withoutSSNs. The subsequent analysiswould not provide a complete lookat the non-covered populationbecause of coverage deficiencies inthe survey, itself, both in terms ofsegments of the population under-represented and persons in thesurvey sample for whom SSNs arenot available. Still, much could belearned about the adults and chil-dren not found in the administra-tive record systems.

A SIPP linkage might be particularlyuseful for missing adults becauseit would identify the governmentprograms in which they are partici-pating and provide details aboutsocial and economic circum-stances. For households in whichadults have been matched but notall of the children, a key focusmight be to identify the adminis-trative records characteristics ofthe adults who then might becomean additional special population foradministrative records census pur-poses. That is, these householdsmight receive a special mailing inorder to obtain a more completeenumeration of the household inAREX 2010.

4.2 Race and HispanicOrigin enhanced Numident

Improved modeling of race andHispanic origin for administrative

62 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

11 Suggestions for additional administra-tive records acquisitions are given in Bye,2002, section 5.

Page 73: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

records will not provide a generalsolution to decennial census meas-urement for two reasons. First, itis unlikely that the models can pro-vide adequate fit for the most diffi-cult to measure groups such asAmerican Indians and AlaskanNatives, Hawaii Islanders, and per-sons classifying themselves asmulti-race. Second, even the bestmodels will always suffer from alack of fit in small geographicareas due to variation in localresponses about model predictedaverages.

Looking ahead to 2010, it is impor-tant to continue to annotate theNumident with survey reportedrace and Hispanic origin in order tomake the annotated Numident ascomplete as possible. TheAmerican Community Survey (ACS)would be a major source of ongo-ing updates, under full implemen-tation.

Because there will always be aresidual subset of the Numidentfor which survey responses to raceand Hispanic origin are not avail-able, there will continue to be aneed for race and Hispanic originmodels. The methods proposedhere make that subset smaller andsmaller.

The initial task of annotating theCensus Numident with race andHispanic origin from Census 2000was described briefly in Section 4,and an ongoing process to fill inthe remaining gaps in theNumident was proposed. Thereare some research activities thatshould be considered in connec-tion with this process.

Evaluation of the initial match

First, the accuracy of theNumident/Census match should beevaluated, perhaps by manualexamination of a sample ofmatched cases. Bye (1999) esti-

mated very high accuracy for asimilar matching process betweenthe SSA Numident and a pair ofACS test sites. His results suggestthat an evaluation sample need notbe large and should be stratifiedby strong and weak stages of thematching operation with thelargest part of the sample comingdisproportionately from the weak-est areas.

Second, the extent to which theinitial match covers a typicaladministrative records populationshould be explored.

Finally, the possibility of augment-ing the initial match between theNumident and the Census shouldbe explored.

Analysis of response variance overtime in reported race and Hispanicorigin

One purpose of continuing toannotate the Numident with raceand Hispanic origin after the initialmatch to Census 2000 is to makethe annotated Numident as com-plete as possible and to have racereports as current as possible foruse in 2010 and the interveningyears. A second purpose would beto study the response variation inself reports over time comparingthe Census 2000 measures tothose obtained in later years fromother surveys. Although the rea-son for observed changes wouldbe confounded somewhat by dif-ferences in mode of administrationof the questions (hopefully the cat-egories would remain unchanged),an analysis of the frequency andnature of reported differenceswould permit an assessment of theefficacy of using reported race orHispanic origin reported at previ-ous times. It would also be usefulto be able to compare change overtime with simple response varianceif the latter measurement is avail-

able from reinterviews in past cen-suses or surveys.

New race and Hispanic origin mod-els

However the enhancement of theNumident is carried out, there willalways be a need for models toimpute race and Hispanic origin tothe residual of persons enumerat-ed from administrative recordswhose Numident record was notenhanced. In such cases, the mod-els developed by Bye (1998), andBye and Thompson (1999) shouldbe discarded, and new modelsshould be estimated from theenhanced Numident itself. Theenhanced Numident would providevery large samples with race meas-urement in the correct format, asituation nonexistent prior to itscreation.

4.3 Household substitutionfor nonresponse followup/unclassified households in2010

Although the results were not con-vincingly strong, due largely to thelimitations on AREX 2000, the ideaof substituting administrativerecords households for NRFU orunclassified households in a con-ventional census merits furtherconsideration. For NRFU house-holds there is the potential for sig-nificant cost savings, and forunclassified households, the poten-tial for greater accuracy than thatprovided by imputation.

The general methodology ofhousehold substitution reported inSection 3 was a two-step approach:Address linkage followed byhousehold substitution in caseswith a high probability of correcthousehold membership. Thisapproach should be tested as partof the 2004 Census Test usingmodels developed from a linkageof StARS 2000 data to the Census

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 63

Page 74: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

2000 HDF. The timing of theadministrative records in StARS2000 would be much closer toCensus Day than the StARS 1999data used in AREX 2000, and muchmore like the data that would beavailable in 2010. Of course, sim-ilar administrative data would haveto be available in 2004, the year ofthe Census Test.

Household level research

The immediate focus of household-level research would be in connec-tion with the repeat of the AREXBottom-up using StARS 2000. Forthis test enumeration, it will beimportant to assess the accuracyof the households formed from themore current administrativerecords using the same kinds ofdescriptive and regression analysesthat were applied to the originalAREX data set.

Redoing the household-level analy-sis using AREX results based onStARS 2000 would give a moreaccurate assessment of the abilityof administrative records to recre-ate Census households without thehandicap of the time lag resultingfrom the use of StARS 1999. Withthese more current data, it wouldbe more reasonable to focus theanalysis on exact person matchesbetween households and notdemographic matches. There areseveral advantages to using exactperson matches. First, it is themost simple in concept: Howoften and under what circum-stances can households beobtained from administrativerecords that are the same as thosein the census? Simply put, whendo we get the same people?

Second, the person match wouldremove the emphasis on race andHispanic origin and the problemsthat imputation introduced into theprevious analysis. Although deter-mining when two groups of per-

sons are matched is more difficultthan matching demographic pat-terns, exact person matchingwould rely primarily on name anddate of birth; and race andHispanic origin would be minormatch keys if used at all. ARRSnow has much experience in per-son matching.

Third, with exact person matching,analyses can be done of “true” nearmisses by the administrativerecords. For example, for house-holds with different numbers ofpersons, dependent variablescould be constructed that indicatethat all of the AREX persons werecontained in the Census householdexcept for 1 (or 2) and vice versa.Not only might these kinds ofmisses be acceptable in certainfuture applications, but also study-ing the characteristics of the miss-ing persons may suggest improve-ments for administrative recordssources or processing.

NRFU and/or imputed householdssubstitution

The possibility of substitutingadministrative records householdsfor NRFU households should beexplored using StARS 2000 datamatched to Census 2000. Thedata would come naturally from aStARS 2000 AREX as discussedabove or could be developed sepa-rately if the StARS 2000 AREX isnot done. The StARS 2000 dualprocess would supply the undupli-cated individuals and a set ofaddresses for each individual towhich an address selection rulewould be applied. Administrativerecord persons assembled by finaladdresses would then be matchedto Census 2000 in more or less thesame way as it was done in theAREX to produce a data set foranalysis. The one major differencebetween a data processing opera-tion focused on substitution and a

full enumeration is that block tabu-lations would not be required inthe substitution approach. Thesample could be limited to theAREX test sites; but if resourcespermit, representative samples ofthe full population of administra-tive records should be used.

The dependent variable could beanalogous to that used above–adichotomous variable indicating anexact household match or not. Butother more lenient variables suchas those suggested above (e.g., allpersons are the same except one)might be used if they representalternatives that might be used in2010.

If an adequate model for NRFU orimputed household substitutioncan be obtained from the StARS2000 records, the results could beused in a 2010 Census test, sched-uled for 2004 and 2006.Assuming that the 2004 test issuccessful, it raises the question ofwhat data would be used to con-struct the model that would actual-ly be used in 2010.

4.4 Person unduplication inthe 2010 Census

It is known that developing undu-plication techniques that have asolid operational and statisticalfoundation is no small task.

There are two difficult parts of per-son unduplication, in particularlong-range unduplication. The firstis determining that in fact the twoenumeration records are a dupli-cate. The second is determiningwhich duplicate record should be“preferred” with respect to geo-graphical location (and, implicitly,which is an erroneous enumera-tion).

For the first problem (duplicatedetection), methods have beendeveloped by the Census Bureau

64 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Page 75: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

for finding candidate duplicatepairs. By adding the administra-tive records data from the NUMI-DENT file onto the source datafiles, we gain a powerful field (theProtected Identification Key, or PIK)for confirming that the candidateduplicate pairs are indeed dupli-cates. This confirmation functionhas the direct effect of reducingfollowup workload. Results fromBean and Bauder (2002) clearlydemonstrate this effect: 86.7 per-cent of the duplicates proposed byan enhanced “Further Study ofPerson Duplication” operation wereconfirmed using administrativerecords data.

The second difficult problem (geo-graphical location) continues to bea challenge. While we expect onlya modest benefit in using adminis-trative records data to make thegeographic placement decision, wereally do not have any hard data toaddress the question. Researchshould determine just how big this“modest” benefit is, and its cost-saving implications.

4.5 Master Address Fileimprovement

Many of the administrative recordsobtained by the Census Bureauinclude addresses for the peopleon those records. One result ofbuilding the StatisticalAdministrative Records System(StARS) database each year is aMAF-like list of these addresses,many of which are geocoded tocensus blocks. This list, the StARSMaster Housing File (MHF), is,except for geocoding, constructedindependently of the MAF and canthus help to evaluate and improvethe MAF by:

• Providing small area tallies tocompute MAF quality metrics

• Providing a comprehensive,accurate and timely source ofdata for change detection

• Assisting with targeting of coun-ties or other areas for updatingpurposes

Other administrative records couldalso be useful to identify newlyconstructed housing units or areaswhere new construction is occur-ring but not yet complete. Thenational scope of StARS combinedwith its precise geography make itvery flexible in assisting with thecompletion of the objectives of theMAF/TIGER Enhancement programand meeting the needs of otherprograms.

Duplicate and multiple MAF IDs

Multiple MAFIDs assigned to a sin-gle address and duplicate MAFIDsassigned to multiple addressescontributed substantially to the dif-ficulty in matching administrativerecords addresses to the DMAF andin classifying addresses asmatched, non-matched, or possiblymatched for subsequent addressoperations. These problems werecompounded in the experimentbecause of the need for a secondmatch to the DMAF to transform“collection” geographic codes to“tabulation” geographic codes.

Address record linkage techniques

The 81 percent link rate betweenadministrative records addressesand the Census 2000 HDF, report-ed in the Household-level analysis,was somewhat lower than expect-ed. In particular, as many as 10percent of administrative recordsaddresses that matched the HDFdid not match on a one-to-onebasis. Improvements in addressediting and standardization and indeveloping tools for addressrecord linkage across databaseshave the potential to yield signifi-cant benefits in increasing linkage

rates. At the same time, the AREXdid not provide an assessment offalsely linked addresses and theircharacteristics. Thus researchneeds to be done on both sides ofthe linkage issue in order to insureimproved linkage of administrativerecords addresses in the future.

4.6 Other Census Bureauprograms

SSN verification and search

The successful development of SSNverification and search methodolo-gy by ARRS is one of the mostvaluable results of administrativerecord research. Unduplication ofpersons in administrative recordsis a crucial step in the develop-ment of StARS/AREX enumerations.The unduplication is based largelyon the SSN; and therefore, avail-ability and correctness of the SSNsin administrative records is crucialto the process. Bye (1997) provid-ed a discussion of SSN verificationapproaches in the context of anadministrative records census.

In addition to decennial censusapplications, there are a variety ofapplications in connection withadministrative records linkage withCensus surveys. These includeSSN verification and search for sur-veys that collect SSNs from respon-dents such as the Survey ofIncome and Program Participationand the Current Population Survey.Additionally, the applicationsinclude SSN search for respondentsin surveys in which the SSN is notrequested, such as the AmericanCommunity Survey.

SSN verification and searchmethodology consists of directsearches of the Census Numidentmatching name and date of birthand indirect searches that useaddress information in administra-tive records to identify possibleSSNs for persons at survey

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 65

Page 76: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

addresses. Probabilistic recordlinkage software is used to associ-ate data reported in surveys withNumident data in order to estab-lish ownership of the SSNs. Moreinformation about these methodsand applications can be found inBye (1999).

Intercensal estimates program

In demographic applications, earlyevaluations of the StARS 1999 and2000 files versus Census 2000 andexisting estimates strongly suggestthat StARS data have the potentialto be a useful “check” on existingcohort-component, ratio, and theso-called “administrative records”estimation method (not to be con-fused with StARS itself). BecauseStARS has the potential to beupdated on a year-by-year basis,this “check” is likely to be particu-larly important in the later years ofthe decade.

Two lines of research appear prom-ising: StARS contributing to totalpopulation estimates at the countylevel, and contributing toAge/Race/Sex/Hispanic OriginEstimates at the county level.

Age/Race/Sex/Hispanic origin esti-mates are particularly importantcomponent of the total estimatesprogram, because they serve asimportant control totals to ongoingsurveys.

As the decade past a decennialcensus proceeds, population esti-mates based on the decennial cen-sus and proceeding forward beginto degrade in quality as local popu-lation changes deviate from thatexpected. The administrativerecords databases, however, pro-vide an annual “snapshot” forcounty and possibly incorporatedcity level estimates. While thissnapshot might be slightly inaccu-rate in level, the year over yearchange in the snapshot could pro-vide an important “check” on exist-ing population estimates.

One significant strength of theStARS system is that many of itsaddresses have been geocoded toCensus Blocks, even specificMaster Address File (MAF) identi-fiers. Using StARS data in a syn-thetic fashion, and using the explo-sion in techniques for small area

estimation, we can consider gener-ating total population estimates attract, block group or even blocklevels. These estimates can thenfeed back into survey frames, ACScontrols, and the like.

Improving current surveys

A related use is to use administra-tive records data to improve nonin-terview weighting for nonresponsein surveys; this also requiresmatching and substitution or mod-eling.

Currently, for ongoing survey non-interviews, noninterview adjust-ment “cells” are constructed byidentifying limited aspects of thenoninterview household. For thesecells, a noninterview adjustmentfactor is calculated. However, withthe administrative records databases (StARS) covering the entirecountry, perhaps improvementscan be made. Two differentapproaches should be tested:Noninterview adjustment cell con-struction, and direct imputationmodeling, each using administra-tive records data.

66 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Page 77: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

References

Aguirre International (1995).Public Concerns About the Use ofAdministrative Records.Unpublished document availablefrom the U.S. Census Bureau, July12, 1995.

Alvey, Wendy and Scheuren, Fritz(1982). Background for anAdministrative Record Census.Proceedings of the Social StatisticsSection, Washington DC: AmericanStatistical Association, 1982.

Bean, S. L. and Bauder, D. M.(2002) Census and AdministrativeRecords Duplication Study. DSSDA.C.E. Revision II MemorandumSeries #PP-44. U.S. Census Bureau:Washington, DC.

Berning, Michael A. (2002).Administrative Records Experimentin 2000: Request for PhysicalAddress Evaluation. Washington,D.C.: U.S. Census Bureau.

Berning, Michael A., and Cook,Ralph H. (2002). AdministrativeRecords Experiment in 2000:Process Evaluation. Washington,D.C.: U.S. Census Bureau.

Buser, Pascal, Huang, Elizabeth,Kim, Jay K., and Marquis, Kent(1998). 1996 Community CensusAdministrative Records FileEvaluation. Administrative RecordsMemorandum Series # 17.Washington, DC: U.S. CensusBureau.

Bye, Barry (1997). AdministrativeRecord Census for 2010 DesignProposal. Washington, DC: UnitedStates Department of Commerce.

Bye, Barry V. (1998). Race andEthnicity Modeling with SSANumident Data. AdministrativeRecords Research MemorandumSeries #19, U.S. Census Bureau.

Bye, Barry V. (1999). Social SecurityNumber Search And Verification At

The Bureau Of The Census:American Community Survey andOther Applications. AdministrativeRecords Research MemorandumSeries #31, U.S. Census Bureau.

Bye, Barry V., and Thompson,Herbert (1999). Race & EthnicityModeling w/SSA Numident Data:Two Level Regression Model.Administrative Records ResearchMemorandum Series #22, U.S.Census Bureau.

Bye, Barry V. (2002).Administrative Records Experiment2000: Consolidated Report DRAFT,9/20/2002, U.S. Census Bureau.

Czajka, John L., Moreno, Lorenzo,Schirm, Allen L. (1997). On theFeasibility of Using InternalRevenue Service Records to Countthe U.S. Population. Washington,DC: Internal Revenue Service.

Edmonston, Barry and Schultze,Charles (1995) Modernizing theU.S. Census, National AcademyPress, Washington DC.

Farber, J.E. and Leggieri, C.A.(2002). Building and Validating aNational Administrative RecordsDatabase for the United States.Paper presented at the NewZealand Conference on DataIntegration, January, 2002.

Flippen, Chenoa and Tienda, Marta(2000). Pathways to Retirement:Patterns of labor ForceParticipation and Labor Market ExitAmong the Pre-RetirementPopulation by Race, HispanicOrigin, and Sex. Journal ofGerontology: Social Sciences,55B:1, S14-27.

Gellman, Robert (1997). Report onthe Census Bureau Privacy PanelDiscussion. Unpublished documentavailable from the U.S. CensusBureau, June 20, 1997.

Heimovitz, Harley K. (2002).Administrative Records Experiment

in 2000: Outcomes Evaluation.Washington, D.C.: U.S. CensusBureau.

Judson, D. H. (1992). A GraphicalMethod for Assessing the Goodnessof Fit of Logit Models. StataTechnical Bulletin, 6:17-19.

Judson, Dean H. (2000). TheStatistical Administrative RecordsSystem: System Design, Successes,and Challenges. Presented at theNISS/Telcordia Data QualityConference, November 30-December 1, 2000.

Judson, D.H., and Bauder, Mark(2002). Administrative RecordsExperiment in 2000: HouseholdLevel Analysis. Washington, D.C.:U.S. Census Bureau.

Knott, Joseph J. (1991).Administrative Records.Memorandum for Distribution List,of the U.S. Census Bureau,Washington DC, U.S. CensusBureau, November 12, 1991.

Miller, Esther, Judson, Dean H., andSater, Douglas (2000). The 100%Census Numident: DemographicAnalysis of Modeled Race andHispanic Origin Estimates BasedExclusively on AdministrativeRecords Data. Presented at the2000 meetings of the SouthernDemographic Association, NewOrleans, LA.

Myrskyla, Pekka (1991). Censusby questionnaire–Census by regis-ters and administrative records:The experience of Finland. Journalof Official Statistics, 7:457-474.

Myrskyla, Pekka, Taeuber, Cynthia,and Knott, Joseph (1996). Uses ofadministrative records for statisti-cal purposes: Finland and theUnited States. Unpublished docu-ment available from the U.S.Census Bureau.

Pistiner, Arona, and Shaw, Kevin A.(2000). Program Master Plan for

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 67

Page 78: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

the Census 2000 AdministrativeRecords Experiment (AREX 2000).Administrative Records ResearchMemorandum Series #49. U.S.Census Bureau.

Singer, Eleanor, and Miller, Esther(1992). Reactions to the use ofAdministrative Records: Results ofFocus Group Discussions. CensusBureau report, Center for SurveyMethods Research, August 24,1992.

Sweet, Elizabeth (1997). UsingAdministrative Record Persons inthe 1996 Community Census.Proceedings of the Section onSurvey Research Methods.Alexandria, VA: American StatisticalAssociation.

Thompson, Herbert (1999). TheDevelopment of a Gender Modelwith SSA Numident Data.Administrative Records ResearchMemorandum Series #32, U.S.Census Bureau.

Wagner, Deborah (2002) RaceEnhanced Numident Project– MatchHCUF to Census Numident FlowCharts, U.S. Census Bureau, May2002.

Zanutto, E. (1996). Estimating apopulation roster from an incom-plete census using mailback ques-tionnaires, administrative records,and sampled nonresponse fol-lowup. Presentation to the U.S.Census Bureau, August 26, 1996.

Zanutto, Elaine, and Zaslavsky,Alan M. (1996). Modeling censusmailback questionnaires, adminis-trative records, and sampled non-response followup, to impute cen-sus nonrespondents. InProceedings, Section on SurveyResearch Methods. Alexandria, VA:American Statistical Association.

Zanutto, Elaine, and Zaslavsky,Alan M. (1997). Estimating a popu-lation roster from an incompletecensus using mailback question-naires, administrative records, andsampled nonresponse followup. InProceedings of the U.S. Bureau ofthe Census Annual ResearchConference. Washington, DC: U.S.Census Bureau.

Zanutto, Elaine, and Zaslavsky,Alan M. (2001). Using administra-tive records to impute for nonre-sponse. To appear in R. Groves,R.J.A. Little, and J.Eltinge (Eds),Survey Nonresponse. New York:John Wiley.

68 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Page 79: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Attachment 1. AREX 2000 Implementation Flow Chart

Method 2 Only (Bottom-Up)

Methods 1 and 2

UnmatchedDMAF

Addresses1.130

Obtain person data from Census 2000

(DSCMO) 1.135

Start

Census 2000 Person Data

1.160

DMAF1.95

Planning & OMB Approval

(PRED) 1.05

Perform clerical review of match results

(PRED) 1.115

Extract test site records to create

AREX 2000 Address File

(PRED) 1.60

National Administrative

Address Records File 1.15

Extract test site records from MD&CO Files

(GEO) 1.65

Extract P.O.Box and rural-style addresses

(PRED) 1.85

Maryland & Colorado (MD&CO) GeocodedFiles (with test site

records flagged) 1.25

Receive MD&COFiles from

GEO (PRED) 1.30

Computer geocode the National File

(GEO) 1.20

Match geocoded AREXaddresses to DMAF

(PRED)1.110

1.753

Clerical Resolutionof Ungeocoded

Addresses (MAFGOR)(GEO/FLD/RCCs)

3

Additional Un-geocoded Test

Site Records1.55

Additional Geocoded Test

Site Records1.50

StARS 1999 Master Housing

File (MHF) forMD&CO 1.40

Extractungeocoded

city-style records(GEO) 1.70

Create StARS 1999 from

MD&CO Files(PRED)

1.35

Perform Exploratory Data Analysis (EDA)

on test sites(PRED) 1.45

AREX Address File(after MAFGOR

operation)(aka 3.45) 1.80

2

Request for Physical Addresses

Mailout & Processing(DSCMO/NPC/GEO/RCCs)

1.902

Unmatched Admin. Record

Addresses1.120

Matched Addresses

1.155

StARS Person Data

1.145

G Q PersonData fromCensus

1.140

Obtain DMAFfrom DSCMO

(PRED)1.100

Pull off address records from DMAF by AREX

test site counties(PRED) 1.105

Acquire NationalAdministrative

Records File(PRED) 1.10

Field Address

Verification &

Processing(FLD/DSCMO/NPC) 1.125

4

Field Address Verification &Processing

(FLD/DSCMO/NPC) 1.125

4

AREX Address File(after Request for Physical

Addresses and Field Address Verification & Processing)

(aka2.150 & 4.145) 1.150

Post-Processing

For details, see AREX 2000 Administrative Records Research File Processing Flowcharts. 1.165

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 69

Page 80: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Attachment 2

StARS Process Steps – Outline

The process steps outline that follows is a synthesizedextract from pertinent StARS 1999 programming speci-fications. The outline is presented here to assist inunderstanding the complex nature (at a high level) ofthe operations required to build the StARS database.For a more detailed description of the processes, referto the StARS specifications available from theAdministrative Records Research Staff. In outline for-mat, the “dual-stream” processing steps in the creationof the StARS 1999 database are as follows:

1. Edit and standardize address data from the national-level source files.

a. Combine all records and split resulting file into1000 ZIP Code cuts in preparation for the Code-1process.

b. Pass records through Code-1 to standardize and“clean” the address data.

c. Unduplicate the address records and create theGEO Extract File.

1) Unduplicate on exact match of all addressfields (full 9-digit ZIP Code).

2) Extract file contains minimum number ofdata fields for TIGER coding.

2. Edit and standardize person demographicdata from national-level files.

a. Name edits and standardization designed toenable record matching, linking, and unduplicationwithin the database once SSNs are verified.

b. Split and sort records into Census Numident seg-ments by Social Security Number (SSN) in prepara-tion for SSN Search and Verification (S&V) phase ofStARS.

3. Verify and validate SSNs by matching and com-paring name data, date-of-birth data, and genderinformation against the Census Numident usingAutoMatch.

a. Pass unverified SSNs through “name/date-of birthsearch” phase using AutoMatch.

b. Differing match cut-off scores and weights estab-lished for each source file.

c. Use Census Numident data to fill missing demo-graphic input data. Demographic data (other than

name fields) for all IRS records derived from CensusNumident.

d. Person records now ready for re-link to thegeocoded address records.

4. Create the Master Housing File (MHF) as follows:

a. Pass the ABI commercial file through Code-1 andthe address standardizer to format and “clean” com-mercial addresses.

b. Unduplicate ABI file (exact match of parsedfields), and assign address type.

c. Pass Geocoded files through the address stan-dardizer to obtain parsed address fields in prepara-tion for record unduplication.

1) Assign address type based on standardizedreturn fields.

2) Unduplicate GEO files based on exact matchof parsed fields within type.

d. Merge unduplicated Geo-coded file with undupli-cated ABI file to identify and flag commercialaddresses within each 3-digit ZIP Code file.

1) Assign a Housing Unit Identification Number(HUID).

2) HUID provides a numeric variable indicatorto assist in selection of the best address foroutput to the final StARS database (the CPR).

e. Update the Master Pointer File (MPF) to enableaddress linkage back to original source files. MPFalso reflects number of duplicate addresses associ-ated with each address selected for retention on theMHF.

f. Merge the MHF and MPF and split resulting fileback to original source cuts.

1) Select only the “current” address fromSelective Service Records

2) Merge split files with source Proxy Files toappend proxy addresses and create EnhancedMaster Pointer File.

5. Create Linked Person Files

a. Use “direct access” method to link person recordswith Enhanced Master Pointer File.

b. UID variable identifies the correct EMPF sourcefile to access for selecting required geographic datafor inclusion on Linked Person File.

70 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Page 81: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

c. Link unverified SSN records in the same fashion.

6. Create the Composite Person Record (CPR) byselecting the “best record” from the Linked PersonFiles as follows:

a. Invoke address selection rules to determine thebest address for the person records. Addressselection rules follow:

1) Select the highest HUID category available.

2) Select a non-proxy address over an addresswith a proxy.

3) Select a non-commercial address over a com-mercial address.

4) Select the address based on source file prior-ity as follows:a) IRS 1040 record

b) Medicare record

c) Indian Health Service record

d) IRS 1099 record

e) Selective Service record

f) HUD TRACs record

5) Select most recent record based on theadministrative record cycle dates.

5) Select first record read-in to the processingarray for output to the CPR.

b. Select the best race based on the followingrules:

1) If American Indian or Alaska Native is reflect-ed on the IHS record, accept the value.

2) If an input value is blank or unknown – deferto the PCF.

3)Select the most frequent occurrence.

4) If tied among occurrences, defer to the PCF.

5) If record is from the “New SSN List,” defer tothe PCF.

6) If ties still occur, select first record read-in.

c. Select the best indicator of Hispanic originbased on the following rules:

1) Most frequent non-blank observation(Numident value counted once).

2) If ties occur, defer to the PCF.

3) If the input value is blank, defer to the PCF.

4) If record is from “New SSN List” and non-blank, output a positive Hispanic origin; ifblank; output a blank value (SSN not on PCF).

d. Select the best gender based on the followingrules:

1) If a Selective Service record available, select“male” gender.

2) Select most frequent occurrence, if noSelective Service record available.

3) If ties occur among the observations, deferto the PCF (using random number probabili-ties).

4) If record from “New SSN List” and reflects ablank value, output a blank value to the CPR; ifties exist among the records, output “female”gender.

e. Select Date of Death (DOD) based on the fol-lowing rules:

1) If Medicare record reflects DOD, output thevalue.

2) If more than one Medicare record reflectsDOD, select the value from the most recentrecord (based on transaction cycle date).

3) If no Medicare record available, output thevalue present on the Numident.

4) If no reported DOD, defer to the PCF usingrandom number probability after calculatinggender.

5) If input is blank and the PCF indicates “alive,”output a blank DOD value.

f. Select the date of birth (DOB) based on the fol-lowing rules:

1) Select the highest DOB score within the fol-lowing source file priority:

a) Medicare

b) Selective Service

c) Census Numident

d) HUD TRACS

e) Indian Health Service

2) If input is blank, output a blank value to theCPR.

U.S. Census Bureau Results From the Administrative Records Experiment in 2000 71

Page 82: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

g. Select the best “name fields” based on the fol-lowing criteria:

1) Highest name score with an exact match oflast name.

2) Exclude all IRS records and records from the“New SSN List.”

3) If only excluded names are in the processingarray, select the first record read-in.

4) If ties occur, select the first record read-in.

7. Each variable is flagged to reflect the decisionrule invoked and the source of the data.Decision rules are established to account for thecharacteristics of each input source date.

72 Results From the Administrative Records Experiment in 2000 U.S. Census Bureau

Page 83: and Evaluation Program Results From the - Census.gov · Nancy W. Osbourn, Arona L. Pistiner, Ronald C. Prevost, Dean M. Resnick, ... U.S. Census Bureau Results From the Administrative

Census 2000 Topic Report No. 7TR-7

Issued March 2004

Data Processing in Census 2000

U.S.Department of CommerceEconomics and Statistics Administration

U.S. CENSUS BUREAU

Census 2000 Testing, Experimentation, and Evaluation Program