random group variance adjustments when hot deck imputation is used to compensate for nonresponse
DESCRIPTION
Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse. Richard A. Moore Company Statistics Division US Census Bureau Presented by Samson Adeshiyan. 2002 Survey Of Business Owners (SBO) Primary Goal. Provide Business Ownership Statistics State - PowerPoint PPT PresentationTRANSCRIPT
Random Group Variance Adjustments
When Hot Deck Imputation Is Used to Compensate for Nonresponse
Richard A. Moore
Company Statistics Division
US Census Bureau
Presented by
Samson Adeshiyan
2
2002 Survey Of Business Owners(SBO) Primary Goal
• Provide Business Ownership Statistics– State– Industry – Demographic Group
• Race --- Native American, Asian, Black, Hawaiian/Pacific Islander, White, Public
• Ethnicity --- Hispanic, Non-Hispanic• Gender --- Female, Equal, Male
3
SBO Primary Publication Level Statistics
• Black-owned Grocery Stores in North Dakota (ND)– Number– Aggregate Sales– Aggregate Payroll– Aggregate Employment
4
What Do We Have?(Econ Census and Tax Returns)
• 5.5 mil. companies with paid employees– Receipts, Payroll, Employment– Geographic Codes– Industry Codes
• 17.5 mil. companies without paid employees– Receipts– Industry and Geography Codes
5
What Are We MissingFor Each Business?
• Race of Ownership
• Ethnicity of Ownership
• Gender of Ownership
• Obtain this from a stratified sample of 2.5 million businesses
6
Distribution At the US Level23 Million Companies
• Women --- 28%• Hispanic --- 7%
• Black --- 5%• Asian --- 5%• Native American --- 1%• Hawaiian/Pacific Islander --- 0.1%
7
Problem 1: Need Sufficient Representation in the SampleBlack-Owned Groceries in ND
• 2002 Estimates– 78 Black-owned businesses in ND– 15 of these in Retail– Only 4 are Grocery Stores
• Can’t list groceries in ND in random order and sample systematically
8
“Modeled Guess” Codes from Admin Info For Each Company
• Response from a Previous SBO• Population Distribution by ZIP Code• State/Industry Distribution in 1997 SBO• Owner’(s) Social Security Number when
Available – Race/Hispanic/Gender Codes on SSN Application– Surnames (e.g. LOPEZ or WANG)– Country of Birth (e.g. Korea or CUBA)– Decennial Responses
9
Example
• Name …. Michelle Wie’s Pro Shop
• Modeled Guess …. Asian Female
• Likelihood-Race ……. 0.8912
• Likelihood-Hisp ……. 0.0012
• Likelihood-Female …. 0.9500
10
Warning: Model is not 100% accurate
• Michelle Wie’s Pro Shop – Responds As White, Non-Hispanic,Male– Tabbed As White, Non-Hispanic,Male
• If Business response is inconsistent with modeled likelihoods, tabulate by the responses
• If a business does not respond, don’t directly infer responses from likelihoods
11
Problem 2:Differential Response Rates
Between Demographic Groups
Owner Likelihood-Hispanic Response
Jose Martinez 0.985 Hispanic
John Martinez 0.940 ???
Jose’s Sub Shop 0.123 Non-Hispanic
Juanita Martin 0.060 Non-Hispanic
John Martin 0.040 Non-Hispanic
12
Likelihoods Aid in Non-Response Adjustment
Likelihood-Hispanic Response Weight
1 0.985 Hispanic 4.0
2 0.940 ??? 4.0
3 0.123 Non-Hispanic 4.0
4 0.060 Non-Hispanic 4.0
5 0.040 Non-Hispanic 4.0
Response Rate Adjusted Hispanic-owned Est…5.0 (4.0 * 5/4)
Hot Deck Imputed Hispanic-owned Est … 8.0 (4.0 + 4.0)
13
For Variance:Random Group Replication (RG)
• Considerable number of cases where the modeled guess disagrees with the actual response– Cases tabbed from other stratum– Considerable variability in the weights of
the tabulated cases
14
Likelihoods Aid in Non-Response Adjustment
Like Response Weight RGRcts
1 0.98 Hispanic 4.0 1 10
2 0.94 ??? 4.0 2 1
3 0.12 Non-Hispanic 4.0 3 5
4 0.06 Non-Hispanic 4.0 4 6
5 0.04 Non-Hispanic 4.0 5 8
Imputed Hispanic Firms Est = 8 Imp Hispanic Receipts = 44
15
For variance calculation:Wt Adjustment Method
Factors on Responding Firms• Firms
– Respondents Estimate = 4– Post Impute Estimate = 8– Weight Adjustment Factor = 2.0
• Receipts– Respondents Estimate 40– Post Impute Estimate = 44– Weight Adjustment Factor = 1.1
16
Oh-Scheuren Adjustment Factor (1983)
r = # respondents
i = # imputed cases
n = i + r = total number of cases
V1 = variance with impute treated as reported
V2 = V1 * (n/r + i/n)
17
Oh-Scheuren MethodProblems with Comparison
• Research developed for Jackknife not Random Group
• Calculate response rates for cell
• Best response for our example– Not Missing Random– True response rate is 4 of 5– Response rate for Hispanics is 1 of 2
18
Donor Imputation Method(RG # Also Donated)
Likelihood Response Weight RG Receipts
1 0.98 Hispanic 4.0 1 10
2 0.94 ??? 4.0 2 1
1 0.98 Hispanic 4.0 1 10
2 0.94 Hispanic 4.0 1 1
Imputed Hispanic Firms Est = 8
Imputed Hispanic Receipts = 44
Only RG #1 is non -zero.
Same Estimates. Higher Variances.
19
Advantages of Donating RG #
• No need to add multiple factors to record
• No need to calculate factors
• No problems for microdata users
20
Compare the Ratios of the Variance of the three Methods
R1 = VAR(Oh-Scheuren) / VAR (Weighted Adjustment)
R2 = VAR(Donor) / VAR (Weighted Adjustment)
Mean for R1 and R2 across publication cells
Std Dev for each of the means of R1 and R2
Null Hypothesis: Ri = 1 (90% confidence)
21
Ratio of Variances --- Firm Counts
* Not Statistically Significant from 1.00 at 90%
# Imputes Oh-Sch/ Wt Donor/Wt
1 to 3 1.148 0.984*
4 to 5 1.176 0.963
6 to 9 1.136 0.941
10 to 19 1.087 1.069
20 to 49 1.069 1.205
50 or more 1.053 1.367
22
Ratio of Variances --- Receipts
* Not Statistically Significant from 1.00 at 90%
# Imputes Oh-Sch/ Wt Donor/Wt
1 to 3 1.230 0.958*
4 to 5 1.286 0.876
6 to 9 1.540 0.963*
10 to 19 1.541 0.914
20 to 49 1.499 0.900
50 or more 1.512 0.951
23
Ratio of Variances --- Firm Counts
* Not Statistically Significant from 1.00 at 90%
Response Rate
Oh-Sch/ Wt Donor/Wt
45 to 55% 0.930 1.193
55 to 65% 1.076 1.182
65 to 75% 1.153 1.101
75 to 85% 1.130 1.043
85 to 95% 1.153 1.032*
24
Ratio of Variances --- Receipts
* Not Statistically Significant from 1.00 at 90%
Response Rate
Oh-Sch/ Wt Donor/Wt
45 to 55% 1.790 0.902
55 to 65% 1.520 0.904
65 to 75% 1.465 0.940
75 to 85% 1.218 0.945
85 to 95% 1.153 0.954
25
Are the differences acceptable?
Firm Count Variance Ratios Differ by 10%
Receipts Variances Differ up to 70%
=>
Firm Count Relative SEs Differ by about 5%
Receipts Relative SEs Differ by up to 30%
26
Asian-Owned Retail Operationsin New Hampshire in 2002
Estimate Published RSE
Max Change
in RSE
Firms 210 23% + 1%
Receipts $70 Mil 19% + 6%
27
Lingering Question
Is the donation of the RG Number sufficient or do we need to augment the resulting variance with a factor (similar to the Oh-Scheuren factor)?