random group variance adjustments when hot deck imputation is used to compensate for nonresponse

28
Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census Bureau Presented by Samson Adeshiyan

Upload: kaden-gomez

Post on 02-Jan-2016

16 views

Category:

Documents


0 download

DESCRIPTION

Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse. Richard A. Moore Company Statistics Division US Census Bureau Presented by Samson Adeshiyan. 2002 Survey Of Business Owners (SBO) Primary Goal. Provide Business Ownership Statistics State - PowerPoint PPT Presentation

TRANSCRIPT

Random Group Variance Adjustments

When Hot Deck Imputation Is Used to Compensate for Nonresponse

Richard A. Moore

Company Statistics Division

US Census Bureau

Presented by

Samson Adeshiyan

2

2002 Survey Of Business Owners(SBO) Primary Goal

• Provide Business Ownership Statistics– State– Industry – Demographic Group

• Race --- Native American, Asian, Black, Hawaiian/Pacific Islander, White, Public

• Ethnicity --- Hispanic, Non-Hispanic• Gender --- Female, Equal, Male

3

SBO Primary Publication Level Statistics

• Black-owned Grocery Stores in North Dakota (ND)– Number– Aggregate Sales– Aggregate Payroll– Aggregate Employment

4

What Do We Have?(Econ Census and Tax Returns)

• 5.5 mil. companies with paid employees– Receipts, Payroll, Employment– Geographic Codes– Industry Codes

• 17.5 mil. companies without paid employees– Receipts– Industry and Geography Codes

5

What Are We MissingFor Each Business?

• Race of Ownership

• Ethnicity of Ownership

• Gender of Ownership

• Obtain this from a stratified sample of 2.5 million businesses

6

Distribution At the US Level23 Million Companies

• Women --- 28%• Hispanic --- 7%

• Black --- 5%• Asian --- 5%• Native American --- 1%• Hawaiian/Pacific Islander --- 0.1%

7

Problem 1: Need Sufficient Representation in the SampleBlack-Owned Groceries in ND

• 2002 Estimates– 78 Black-owned businesses in ND– 15 of these in Retail– Only 4 are Grocery Stores

• Can’t list groceries in ND in random order and sample systematically

8

“Modeled Guess” Codes from Admin Info For Each Company

• Response from a Previous SBO• Population Distribution by ZIP Code• State/Industry Distribution in 1997 SBO• Owner’(s) Social Security Number when

Available – Race/Hispanic/Gender Codes on SSN Application– Surnames (e.g. LOPEZ or WANG)– Country of Birth (e.g. Korea or CUBA)– Decennial Responses

9

Example

• Name …. Michelle Wie’s Pro Shop

• Modeled Guess …. Asian Female

• Likelihood-Race ……. 0.8912

• Likelihood-Hisp ……. 0.0012

• Likelihood-Female …. 0.9500

10

Warning: Model is not 100% accurate

• Michelle Wie’s Pro Shop – Responds As White, Non-Hispanic,Male– Tabbed As White, Non-Hispanic,Male

• If Business response is inconsistent with modeled likelihoods, tabulate by the responses

• If a business does not respond, don’t directly infer responses from likelihoods

11

Problem 2:Differential Response Rates

Between Demographic Groups

Owner Likelihood-Hispanic Response

Jose Martinez 0.985 Hispanic

John Martinez 0.940 ???

Jose’s Sub Shop 0.123 Non-Hispanic

Juanita Martin 0.060 Non-Hispanic

John Martin 0.040 Non-Hispanic

12

Likelihoods Aid in Non-Response Adjustment

Likelihood-Hispanic Response Weight

1 0.985 Hispanic 4.0

2 0.940 ??? 4.0

3 0.123 Non-Hispanic 4.0

4 0.060 Non-Hispanic 4.0

5 0.040 Non-Hispanic 4.0

Response Rate Adjusted Hispanic-owned Est…5.0 (4.0 * 5/4)

Hot Deck Imputed Hispanic-owned Est … 8.0 (4.0 + 4.0)

13

For Variance:Random Group Replication (RG)

• Considerable number of cases where the modeled guess disagrees with the actual response– Cases tabbed from other stratum– Considerable variability in the weights of

the tabulated cases

14

Likelihoods Aid in Non-Response Adjustment

Like Response Weight RGRcts

1 0.98 Hispanic 4.0 1 10

2 0.94 ??? 4.0 2 1

3 0.12 Non-Hispanic 4.0 3 5

4 0.06 Non-Hispanic 4.0 4 6

5 0.04 Non-Hispanic 4.0 5 8

Imputed Hispanic Firms Est = 8 Imp Hispanic Receipts = 44

15

For variance calculation:Wt Adjustment Method

Factors on Responding Firms• Firms

– Respondents Estimate = 4– Post Impute Estimate = 8– Weight Adjustment Factor = 2.0

• Receipts– Respondents Estimate 40– Post Impute Estimate = 44– Weight Adjustment Factor = 1.1

16

Oh-Scheuren Adjustment Factor (1983)

r = # respondents

i = # imputed cases

n = i + r = total number of cases

V1 = variance with impute treated as reported

V2 = V1 * (n/r + i/n)

17

Oh-Scheuren MethodProblems with Comparison

• Research developed for Jackknife not Random Group

• Calculate response rates for cell

• Best response for our example– Not Missing Random– True response rate is 4 of 5– Response rate for Hispanics is 1 of 2

18

Donor Imputation Method(RG # Also Donated)

Likelihood Response Weight RG Receipts

1 0.98 Hispanic 4.0 1 10

2 0.94 ??? 4.0 2 1

1 0.98 Hispanic 4.0 1 10

2 0.94 Hispanic 4.0 1 1

Imputed Hispanic Firms Est = 8

Imputed Hispanic Receipts = 44

Only RG #1 is non -zero.

Same Estimates. Higher Variances.

19

Advantages of Donating RG #

• No need to add multiple factors to record

• No need to calculate factors

• No problems for microdata users

20

Compare the Ratios of the Variance of the three Methods

R1 = VAR(Oh-Scheuren) / VAR (Weighted Adjustment)

R2 = VAR(Donor) / VAR (Weighted Adjustment)

Mean for R1 and R2 across publication cells

Std Dev for each of the means of R1 and R2

Null Hypothesis: Ri = 1 (90% confidence)

21

Ratio of Variances --- Firm Counts

* Not Statistically Significant from 1.00 at 90%

# Imputes Oh-Sch/ Wt Donor/Wt

1 to 3 1.148 0.984*

4 to 5 1.176 0.963

6 to 9 1.136 0.941

10 to 19 1.087 1.069

20 to 49 1.069 1.205

50 or more 1.053 1.367

22

Ratio of Variances --- Receipts

* Not Statistically Significant from 1.00 at 90%

# Imputes Oh-Sch/ Wt Donor/Wt

1 to 3 1.230 0.958*

4 to 5 1.286 0.876

6 to 9 1.540 0.963*

10 to 19 1.541 0.914

20 to 49 1.499 0.900

50 or more 1.512 0.951

23

Ratio of Variances --- Firm Counts

* Not Statistically Significant from 1.00 at 90%

Response Rate

Oh-Sch/ Wt Donor/Wt

45 to 55% 0.930 1.193

55 to 65% 1.076 1.182

65 to 75% 1.153 1.101

75 to 85% 1.130 1.043

85 to 95% 1.153 1.032*

24

Ratio of Variances --- Receipts

* Not Statistically Significant from 1.00 at 90%

Response Rate

Oh-Sch/ Wt Donor/Wt

45 to 55% 1.790 0.902

55 to 65% 1.520 0.904

65 to 75% 1.465 0.940

75 to 85% 1.218 0.945

85 to 95% 1.153 0.954

25

Are the differences acceptable?

Firm Count Variance Ratios Differ by 10%

Receipts Variances Differ up to 70%

=>

Firm Count Relative SEs Differ by about 5%

Receipts Relative SEs Differ by up to 30%

26

Asian-Owned Retail Operationsin New Hampshire in 2002

Estimate Published RSE

Max Change

in RSE

Firms 210 23% + 1%

Receipts $70 Mil 19% + 6%

27

Lingering Question

Is the donation of the RG Number sufficient or do we need to augment the resulting variance with a factor (similar to the Oh-Scheuren factor)?

28

Any Questions?

Richard Moore

[email protected]