dual system estimation and census adjustment

Download Dual System Estimation and Census  Adjustment

If you can't read please download the document

Upload: kira

Post on 25-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Dual System Estimation and Census Adjustment. Stephen E. Fienberg Statistics 36-149 Department of Statistics Carnegie Mellon University November 27-29, 2001. fish* penguins homeless prostitutes in Glasgow Italians with diabetes*. people in the U.S.** people with HIV virus - PowerPoint PPT Presentation

TRANSCRIPT

  • Dual System Estimation and Census Adjustment

    Stephen E. FienbergStatistics 36-149Department of Statistics Carnegie Mellon UniversityNovember 27-29, 2001

  • What Do Following Populations Have in Common?fish* penguinshomelessprostitutes in Glasgow Italians with diabetes*

    people in the U.S.**people with HIV virus adolescent injuries in Pittsburgh, PAWWW

  • Example 1: Diabetes PrevalenceBruno et al. (1994) used 4 sources for ascertainment of diabetes in Casale Monferrat, Northern Italy

    s1: diabetes clinic and/or family physicians

    s2: patients discharged with diagnosis from hospitals

    s3: insulin or oral hypoglycaemic prescriptions

    s4: requests for reimbursement for insulin and reagent strips

  • Example 1: Diabetes (cont.)s1 Yes Yes No Nos2 Yes No Yes Nos3 s4Yes Yes 58 46 14 8Yes No 157 650 20 182No Yes 18 12 7 10No No 104 709 74 -n = 2069

  • Example 2: Fish in a Lake200 fish caught 1st time150 fish caught 2nd timeOf 150 fish in 2nd sample, 125 were among 200 counted in 1st sampleTotal number of fish caught= 200 + (150 - 125) = 225

    But how many fish have gone undetected?

  • Example 2: Fish in a LakeProportion of fish in 2nd sample also in 1st= 125/150 = 5/6 Generalize from sample to population(5/6) N = 200N = (6/5) 200 = 240This is method of capture-recapture due to Peterson, Lincoln, Schnabel, etc.^^

  • Capture-Recapture Model Sample 2

    In Out Total

    In a b n1

    Out c d ?? N -n1

    Total n2 N - n2 N ??N = n1 n2/a^Sample 1

  • Role of Independence

  • Some Formal DetailsAlternatively, we think in terms of the ratio of odds for row 1 vs. odds for row 2:

    P{A and B} / P{A and Bc} P{Ac and B} / P{Ac and Bc}

    P{A and B} P{Ac and Bc} P{Ac and B} P{A and Bc}

    and under independence this equals 1.=

  • Some Formal DetailsBack to data.

    We think of independence in terms of equality of odds, and we set

    ad /bc = 1

    and estimate unobserved d by

    d = bc/a

    N = a+ b+ c+bc/ a

    = n1 n2/a^^

  • More Formal Version = n1 n2/a125 75200

    25 ?

    150 = 150 200/125 = 240

    Sample 2

    Sample 1

    In

    Out

    Total

    In

    a

    c

    n1

    Out

    b

    d

    N-n1

    Total

    n2

    N-n2

    N

  • Example 1: DiabetesLooking at Pairs of Lists

    Estimated s.e.s are on the order of 100. Only 3 of 6 estimates exceed n = 2069. Pair N s1, s22,351s1, s32,185s1, s42,262s2, s32,057s2, s4 803s3, s41,555^

  • Diabetes Example:What is Going Wrong?Independence of lists in the pairs!

  • Capture-Recapture AssumptionsRandom samplesIndependenceClosed populationPerfect matching (no tag loss)Homogeneity

    How do we check on assumptions?

    The problem of the wiley trout.

  • Accuray and Coverage Evaluation SurveySurvey approximately 314,000 HH in 11,000 blocks. Used to correct raw census counts using capture-recapture or dual systems estimation methodology.Correct for omissions AND erroneous enumerations.

  • ACE Design Two parts to ACE sample of blocks:sample of population -- P-sampleused to estimate omissionsmatched records against those for censussample of census -- E-sampleused to estimate erroneous enumerationssubtract out EEs from census counts before using DSE

  • Dual Systems Components

    Census

    Sample

    In

    Out

    Total

    In

    Matches

    ACE

    Non-Matches

    ACE

    Total

    Out

    Census

    Non-Matches

    Missed in

    Both

    Total

    Census

    Total

    Population

    Total

    Census

    Sample

    In

    Out

    Total

    In

    Matches

    PES

    Non-Matches

    Total

    PES

    Out

    Census

    Non-Matches

    Missed

    In

    Both

    Total

    Total

    Census

    Total Population

    Census

    Sample

    In

    Out

    Total

    In

    Matches

    PES

    Non-Matches

    Total

    PES

    Out

    Census

    Non-Matches

    Missed

    In

    Both

    Total

    Total

    Census

    Total Population

  • DSE With Same Values As Fish nCEN =census count - EEs

    = nCEN nACE/a125 75200

    25 ?

    150 = 150 200/125 = 240

    Census

    Sample

    In

    Out

    Total

    In

    a

    c

    nACE

    Out

    b

    d

    N-nACE

    Total

    nCEN

    N-nCEN

    N

  • DSE Features in 2000Excluded homeless/shelters and group quarters from calculations in 2000Adjusted sample counts for moversSearching in adjacent blocks

  • Some Practical IssuesHow big is d relative to c?Within HH vs between HH omissionsCounts of zeroNegative adjustment factors --
  • Dual Systems AssumptionsPerfect matchingidea of probabilistic matching with variable probabilities for different individualsHomogeneityDependence between sample and censusheterogeneity and dependence get combined in what is called correlation biasErrorless assessment of erroneous enumerations

  • ACE ImplementationAggregate counts from census blocks for various demographic and racial/ethnic groups.Apply DSE for these aggregates (called post-strata).Generalizing from adjustments for the ACE sample of blocks and strata to the nation.synthetic error

  • Post-strataInstead of doing DSE at the block level, we reorganize the data by grouping parts of blockes according toagerace/ethnicitysexoccupancy statusmail return rateResults in over 480 post-strata, and we apply DSE in each.

  • What Do We Know About Dual SystemsAssumptions at Post-strata Level?

  • Synthetic AssumptionCarrying the adjustments back to the individual blocks not in the ACE sample:Assumes the homogenity of all of those parts of blocks in each post-stratum.Result is that some blocks increase and some blocks decrease in estimated population sizedecreases total 1 millionincreases total 4.3 million

  • March 2001 Adjustment Decision Not ready to adjust using DSE.Concerns:DAloss functionscounties under 100,000balancing error synthetic error

  • Oct. 2001 Adjustment Decision Still not ready to adjust!Old concerns:DAloss functions?balancing error - nosynthetic error -noNew concern:missed EEs in ACE